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A Vocational Interest Test at the Skilled Trades Level * 


Kenneth E. Clark 
University of Minnesota 


The counseling of college students who plan to enter one of the pro- 
fessional fields has been greatly aided by the development of the Strong 
Vocational Interest Blank and the Kuder Preference Record. Widespread 
use of these devices not only with college students but with high school 
students, job applicants, the unemployed, and other groups has demon- 
strated the usefulness of a measure of an individual’s interests in com- 
parison with those of successful workers in a given occupation. One of 
the serious limitations of these instruments, however, is the inadequate 
coverage of occupations at the skilled and semi-skilled levels. Thus, 
the Strong Vocational Interest Blank can be scored only for carpenter, 
printer, and policeman at these occupational levels.!. As a result, the 
vocational counselor is much better prepared to counsel the small minority 
of potential professional, semi-professional and technical workers than to 
counsel the large majority of persons planning to enter skilled, semi- 
skilled, and unskilled occupations—at least as far as the measurement 
of interest patterns is concerned. 

During World War II, the armed services placed great emphasis on 
the measurement of aptitudes; little was placed on the measurement 
of interests. It frequently happened that, when highly capable men 
were sent to technical schools for training, school officials would often 
complain that the students were not “interested.”” That it would have 

* This research was carried out under Contract N6ori-212, T. O. III, between the 
Office of Naval Research and the University of Minnesota. This paper is based on 
Report No. 1 under that contract. The writer wishes to acknowledge the work of Mr. 
Herbert S. Klapper, Mrs. Patricia Hayes, and Mr. Robert I. Hudson in the collection 
and analysis of data, and in the preparation of this report. The assistance of Professors 
Donald G. Paterson and John G. Darley was invaluable both in the planning of various 
aspects of this program, and in the critical reading of manuscripi. 

1 The trade unions used in this report are members of the St. Paul Trades and Labor 
Assembly, and include Bakery and Confectionery Workers, No. 21; Electrical Workers, 
No. B-110; Milk Driver Employees, No. 546; Painters, No. 61; Plasterers and Cement 
Finishers, No. 20; Plumbers, No. 34; Sheet Metal Workers, No. 76; Typographical 
Union, No. 30; and Steam Fitters-Pipe Fitters, No. 455. 
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been desirable to pay more attention to measured interests of individuals 
was generally recognized. To actually do so, in practice, was rather 
difficult. For one thing, military terminology is strange to the newly 
inducted recruit. To ask for statements of job preferences in terms of 
job titles is therefore likely to be futile. To ask for a statement of 
preferences in terms of definite types of activities is also likely to obtain 
information of doubtful value either from a civilian or a military re- 
spondent. Even were such an approach considered desirable, it is 
likely that the high level of affect among recruits would lead them to 
state preferences in terms of assignments which either keep them closer 
to home, keep them in the continental United States longer, or either 
reduce or increase their likelihood of being assigned to combat duty. The 
use of a questionnaire which could be scored to indicate the interests of an 
individual in terms of the known interest patterns of members of military 
occupational groups was not possible because such an instrument did not 
exist. It is the purpose of the present investigation to explore the 
possibilities of developing such an interest measure, usable for potential 
workers both in the occupations of enlisted men in the armed services, 
and in the corresponding civilian occupations. 


The Questionnaire 


To provide the information on preferences needed for the analysis of 


interest patterns, a 570-item questionnaire, the Minnesota Vocational In- 
terest Inventory, was prepared. Items in the questionnaire were grouped 
in three’s, making up a total of 190 triads. The individual respondent 
is asked to select from each triad of items the one activity he would like 
most, and the one he would like least, leaving the third item blank. The 
approach used is thus a forced-choice, with the respondent who follows 
directions being required to make a total of 380 choices, half of them 
“like” and half of them “dislike.” 

The items used in the inventory were selected from a variety of 
sources. First, a large number of items were written which described 
jobs or tasks making up part of a job.2, The Dictionary of Occupational 
Titles, the Manual of Navy Job Classifications (Nav Pers 15105)* and 
similar materials were scrutinized for suggestions for items. The final 
list of items used contains such activities as the following, grouped in 
three’s as shown, with the directions for marking responses as indicated: 

* The writer would like to acknowledge the able assistance of Josephine Welch in 
this part of the project. 

* Dictionary of Occupational Titles, Part I, Definitions of Titles. Washington: 
Government Printing Office, 1939. Pp. 1-1040. 


‘Manual of Enlisted Navy Job Classifications (Nav Pers 15105). Washington: 
Bureau of Naval Personnel, 1945. 
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Directions 


On the following pages you will find many activities listed. They are arranged in 
blocks of three. You must make a choice in each block of the one thing you like to 
do most, and the one thing you like to do least. 


Mark the thing you like to do most with a plus-sign (+). 
Mark the thing you like to do least with a minus-sign (—). 
That leaves one of three items blank. 


Example: ( ) a. Write letters. 


(+) b. Fix a leaky faucet. 
(—) c. Interview someone for a newspaper story. 


Now turn the page and begin. Be sure to fill out all pages. 


Items are grouped in three’s in a haphazard fashion. Thus, no a priori 
plan of scoring played any role in determining how items were combined to 
make triads. Asa result, the blocks of three look like the following examples: 


a. Be a grocer. a. Varnish a floor. 

b. Be a printer. b. Learn to use a slide rule. 

c. Be a shop foreman. c. Repair a broken connection on an 
electric iron. 


a. Tune a piano. a. Putter around in a garden. 
b. Cook a meal. b. Take part in an amateur contest. 
ce. Change a tire on an automobile. c. Cook spaghetti. 


It should be noted, however, that although items are not grouped in three’s 
in any pre-arranged manner with regard to possible responses of individuals in 
different occupations, nonetheless some attempt was made to keep the nature 
of the items within the same order of complexity. Thus, learning calculus is 
not combined with an item on cooking a meal, since these two items are not 
compatible in terms of the ordinary life situations which would present a 
choice between two such alternatives. Even this kind of control did not always 
operate, so that one discovers such an odd combination as the following in the 
finished questionnaire: a. Address envelopes; b. Try to find an error in a 
financial account; and c. Help put out the fire in a burning building. 

Several difficulties are encountered in an approach of this sort. The obvious 
one, and the one which produced most frequent comments from the skilled 
tradesmen who cooperated in this investigation, is that the combination of 
forced-choice and haphazard arrangement of items in groups produced many 
triads in which a decision is difficult. Thus, many highly masculine members 
of the electrician’s group had difficulty finding an item to mark with a plus 
(“‘like’’) in the following group: a. Put a closet in order; b. File cards in alpha- 
betical order; and c. Makea pie. In spite of this difficulty, the writer believes 
this method is still to be preferred to the more obvious type of choice which is 
made when a questionnaire’s content is “stacked.” 

Decisions regarding types of items, and mode of response, were made 
largely on the basis of rather meager experimental evidence, and on the basis 
of subjective appraisal of the types of items which would work best in the situa- 
tions where it is expected the inventory will be used. 


The Criterion Groups 


The plan of the present inquiry required that the questionnaire be 
administered to successful employed workers in the skilled trades occupa- 
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tions. It was not considered desirable to prepare keys solely in terms of 
rubrics identified by any other means. 


The first contacts with employed groups were made through various busi- 
ness and industrial organizations in Minneapolis and St. Paul, Minnesota.§ 
Willingness to cooperate in the project was expressed by personnel managers 
of many of these concerns, with the reservation that the matter should be 
cleared with union representatives before any action was taken. Furthermore, 
some reluctance to use company time for the collection of data was expressed, 
along with assurances that this, indeed, was a worthy project. 

Union representatives were, accordingly, contacted, and the possibilities 
of the program described to them. Union leaders were, on the whole, willing 
and anxious to cooperate in any program which might eventually operate to 
increase their own effectiveness in selecting apprentices for training in their 
own trades. With only one exception, union groups who were contacted agreed 
to aid in the assay of interest patterns of their own memberships. 

How to obtain the responses of the membership still remained a major 
problem. The first attempt was attendance at union meetings. The program 
of research was presented to the membership with a request for their coopera- 
tion in responding to the questionnaire during the meeting itself. Inasmuch 
as completing the questionnaire required from 45 minutes to over an hour, this 
effort proved to be futile. Somehow or other union meetings were not con- 
sidered by the membership a suitable time for this sort of work, and as a result, 
many questicnnaires were begun, but few were completed. 

A second ‘effort was made at the place of employment. A representative 
of the project, accompanied by a union official, would make the rounds of 
places of business, would describe the research program to the worker, who 
would then fill out the questionnaire while our representative and the union 
representative waited. This method was most effective, although unpopular 
with the employer, and excessively time-consuming. 

A third attempt was by use of the mails. Through the trust and coopera- 
tion of the union leaders, it was possible to use their mailing lists, and to send 
questionnaires to the membership, with a covering letter signed by the business 
agent, or another official of the union. These letters asked for the cooperation 
of the membership, gave a brief word about the objectives of the program, and 
gave the endorsement of the local union officials to the project. A stamped 
envelope was enclosed, addressed to the union office. Returns were anony- 
mous, with questionnaires coded in such a way as to permit follow-ups only 
to those who had not yet returned their questionnaires. 

Of the various methods tried for obtaining adequate coverage of workers 
in a given occupational group, the mail questionnaire method proved to be 
most effective. Thus, while 3500 questionnaires were distributed in union 
meetings, with the membership voting the program “hearty support,” only 
129 usable returns were obtained. Mail questionnaires to 320 electricians 
yielded, with one follow-up letter, a return of 201 questionnaires—a 63 per cent 
return, of which most were usable. For the A. F. of L. trade unions used so 
far, questionnaires have been mailed to the entire male membership. Figures 
on percentages returned and usable are shown in Table 1. 

The sampling of occupational groups for the purpose of describing interest 
patterns requires that the segment used be representative of the entire adult 
group employed in the field. To what extent is this true of our groups? 
Several factors operate to bias our samples: 


5 The writer wishes to acknowledge the indispensable aid of Mr. Herbert 8. Klapper 
in the collection of data from employed skilled tradesmen. 
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1. Geographically, the samples are highly restricted. Only if the St. Paul 
skilled tradesmen are strictly representative of the skilled tradesmen in the 
same occupations all over the country will this bias be eliminated. It is in- 
tended that this geographic bias be reduced in later samples, not only by 
securing returns from Minneapolis workers, but by obtaining samples in other 
localities.® 

2. Only skilled tradesmen who are members of St. Paul locals of the 
American Federation of Labor are included in the samples. This, obviously, 
is a serious source of bias, and one which will need to be corrected. This study 
is restricted to the A. F. of L. unions partly as a matter of expediency, and in 
part because these unions are trade unions, not industrial unions. Working 
with limited funds, this study could not even exploit all of the data-collecting 
opportunities provided by the St. Paul Trades and Labor Assembly of the 
A. F. of L., and so there was little point in diversifying the required contacts. 

3. All members of a particular union local did not return their question- 
naires, and so were never included in a sample. While a 100 per cent sample 
would have been ideal, it becomes much too expensive to even try. As noted 
in Table 1, the coverage of a particular union is in each instance better than 
fifty per cent, but never more than 75 per cent.’ It is likely that those workers 
who responded to our mail appeals represent a different type of person from 
those who did not respond. How serious a bias this is cannot easily be assayed. 


It may be that these biases which affect all of the groups do not influence 
the difference between groups, since the preparation of occupational keys 
requires the determination of the interest patterns which differentiate one 
occupation from another. Thus, geographic location may produce a larger 
number of workers who say that they like to fish, but would not affect the size 
of the differences between groups in this expression of interest. 

The present report concerns itself with an analysis of the responses to the 
Minnesota Vocational Interest Inventory of workers in the eight civilian occupa- 
tional groups for whom samples of some size are available. These groups, and 
their sizes, are reported in the last column of Table 1. 


Development of a Tradesmen-In-General Group 


In order to score the responses of men in a given occupational group 
to show the items which are answered in the same way by these men, it is 
necessary to have some basis of comparison with persons not in that 
occupational group. Thus, it is not enough to know that 75 per cent of 
electricians respond to an item in the same way, for it may be that 75 
per cent of all men would respond in that way. To obtain a group of 
adults who would represent a cross-section of all adult men in the skilled 
trades has not been possible within the scope of this project. To obtain 
an estimate of the responses which such a group would make, the following 
procedure was employed: 


6 During World War I, and subsequent work of the Occupational Research Program 
Staff of the U. S. E. S., trade tests were standardized in three geographically separated 
localities to overcome possible local peculiarities in trade practices. 

7Information given in Strong’s Vocational Interests of Men and Women, does not 
indicate what percentage returns he attained. However, it is probable that he seldom 
achieved a 75 per cent return, or even a 50 per cent return. 
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a. The percentage response of members of each of the eight occupational 
groups in the civilian trades being analysed (listed in Table 1) to every item of 
the inventory was computed. 

b. The percentage responses for each of the eight groups were added 
together and divided by eight, giving the average percentage response to each 
item. 

c. This average percentage response was used as the best estimate of the 
percentage response which would be made by members of an actual tradesman- 
in-general group. 

It should be noted that this procedure gives the identical values which 
would have been obtained if a representative and equal sample of the respon- 
dents in each of the eight civilian occupations had been used to make a single 
total group. The procedure operates to eliminate the over-weighting of occu- 
pational groups with a large number of members, and the under-weighting of 
occupational groups of small size. 

This estimate is obviously not an entirely satisfactory solution to the prob- 
lem of getting a tradesmen-in-general group which is truly representative. 
However, it seems likely that for the purposes for which the interest inventory 
is being developed, the procedure gives an adequate base for preliminary 
comparisons between groups. 





Table 1 


Numbers of Questionnaires Mailed and Returned for the Eight A. F. of L. Unions 
Sampled and Numbers Used in Developing Scoring Keys 








Number 
Number Number PerCent Number Per Cent Used 
A. F. of L. Union Sent Returned Returned Usable Usable for Keys* 





Electricians 320 201 63 166 83 185 
Milk Wagon Drivers 608 326 54 218 67 127 
Painters 712 390 55 267 68 252 
Plasterers 167 111 66 74 67 51 
Bakers 473 305 64 144 47 64 
Sheet Metal Workers 298 220 74 164 75 99 
Printers 530 331 62 278 84 300 
Plumbers 576 347 60 199 57 65 

Total 1143 





*'N in this column is the number of persons in the different unions who identified 
themselves as belonging to a particular occupational group. Thus a few electricians 
were found in unions other than the electricians’s union itself; within the milk wagon 
drivers’ union were workers who were not actually milk wagon drivers. 


The Preparation of Scoring Keys 


The purpose of a scoring key for a particular occupation is to make possible 
the comparison of responses of an individual to the responses of members of 
a given occupational group, to determine whether or not the individual’s 
responses are like or unlike those of a given group. It is necessary, therefore, 
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to compare the responses of the group to the responses made by the tradesmen- 
in-general group. Consider the example below: 





Item 18. Percentage Responses of “Like” Made by: 
Tradesmen-in-general Electricians 
a. Be an electrical engineer 61% 90% 
b. Be an aeronautical engineer 20% 5% 
c. Be a surgeon 19% 5% 


It is apparent that electricians have, as a group, a more general preference 
for being an electrical engineer than do tradesmen-in-general, although even 
the latter group selects this response more frequently than either of the other 
two responses. We may score a response of “like” to the response a. as a 
response counting towards a high score on the electricians’ key, since a sig- 
nificantly larger proportion of electricians pick this response than do the 
tradesmen-in-general. A response of “like” to either of the other two items, 
however, may be scored as a response detracting from a high electricians’ 
score, since a smaller proportion of electricians pick this response than do 
tradesmen-in-general. Thus, we might make up our electrician’s key as fol- 
lows: 


a. Be an electrical engineer A response of + counts 1 point 
b. Be an aeronautical engineer A response of + counts —i point 
c. Be a surgeon A response of + counts —1 point 


However, the respondent has also selected one of these three items as the 
one which he likes least, or dislikes most, and has marked that item with a 
— mark. Therefore these percentage responses must also be scrutinized. 


Percentage Responses of ‘‘Dislike’”’ Made by 





Item 18. Tradesmen-in-general Electricians 
a. Be an electrical engineer 12% 1% 
b. Be an aeronautical engineer 21% 21% 
c. Be a surgeon 67% 78% 


So in similar fashion, these responses may be scored as follows: 


a. Be an electrical engineer A response of — counts —1 point 
b. Be an aeronautical engineer A response of — counts 0 points 
c. Be a surgeon A response of — counts 1 point 


Thus, to generalize—a response made more frequently by the members of a 
particular group than by the tradesmen-in-general group is scored as a plus 
in the key for that particular group. A response made less frequently by the 
members of a particular group than by the tradesmen-in-general group is 
scored as a minus in the key for that group. 

How shall the blank item be scored? It is apparent that such a response 
has meaning, just as the indifferent response on the Strong Vocational Interest 
Blank has meaning. The present analysis, however, ignored the blank item 
in scoring for two basic reasons: 1. percentage responses of like and dislike 
reflect the effects of leaving items blank, and therefore use the blank response 
in scoring, and 2. the scoring of the absence of a pencil mark is a complicated 
task in hand-scoring, and an almost impossible task in machine scoring. 

How great a difference should be required? This question cannot be 
answered with finality. In this report, an 11 per cent difference was used as 
the minimum value required for using an item response in a given key. This 
value represents a compromise between the 6 per cent value used by Strong 
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and the 20 per cent and higher values used with success by Hathaway and 
McKinley* in the development of the Minnesota Multiphasic Personality 
Inventory. 

Should greater percentage differences contribute more to total score than 
smaller ones? The Strong Vocational Interest Blank increases the differentia- 
tion between groups by assigning greater weights to responses which differ 
markedly in the criterion group from the responses made by men-in-general, 
and assigning smaller weights to responses where the difference is smaller. 
Preliminary data, in this study, however, indicate a possibility that use of 
multiple weights is not required to maximize the differentiation of occupational 
groups. 


Comparisons on the Eight Occupational Keys 


All members of the eight occupational groups were scored on the key 
for their own occupation. The distributions of these scores are listed 
in Table 2. Also presented in Table 2 are distributions of scores on each 
of these keys of persons not employed in the occupation. A comparison 
of each pair of distributions gives an indication of the degree to which the 
occupational scoring keys actually work in separating workers in a given 
field from persons in other skilled trades jobs. 

The distributions of scores of workers outside the occupation were 
obtained by scoring a sample of 25 inventories of workers from each of 
the other seven occupations. The selection of inventories was made on 
a random basis. A comparison of the distribution of scores of these 
samples of 25 with the distributions of the total group on its own occupa- 
tional key showed only slight differences. The scores of each of the 
combinations of seven groups (N = 175) not belonging to a given occupa- 
tion are distributed in Table 2 as the scores of tradesmen-in-general. 

Marked differences exist between the distributions of scores of non- 
members of an occupation and those of employed workers in the occupa- 
tion. Since the primary purpose of this investigation is to determine 
whether or not it is possible to separate members of skilled trades groups 
on the basis of their measured interests, measure of overlap between 
distributions of members and non-members is the appropriate statistic. 
Table 2 presents the per cent of the tradesmen-in-general exceeding the 
median of a given skilled group. This value varies for the different keys 
from 2.0% to 14.3%, with a median value of 6.3%. Thus, about six 
out of 100 workers not in a give occupation make scores above the median 
of employed workers on the typical scoring key prepared in this study. 

The percentage 6.3 does not compare too favorably with the values 
of two or three per cent obtained by Strong in his work with his Vocational 
Interest Blank at the professional levels. Does this difference indicate 
that trades groups are more nearly alike than are professional groups, 


8S. R. Hathaway and J. C. McKinley. Minnesota Multiphasic Personality Inven- 
tory, Manual. New York: The Psychological Corporation. 
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Table 2 


Distributions of Scores of Workers in a Trade (Group A) and Tradesmen-in-General 
(Group B) on Each of Eight Occupational Scoring Keys 








Milk Sheet 
Plas- Wagon Elec- Metal 
terers Drivers Printers tricians Painters Bakers Workers Plumbers 


Group: A B A B A B A B A B A B A B A B 








Score 
120-129 4 
110-119 26 = 1 4 
100-109 46 4 18 
90-99 38 14 133 1 12 9 
80-89 26 20 26 12 13 «17 
70-79 16 18 oo 2 6 OF 
60-69 1 12 16 20 25 3 28 
50-59 1 5 7 20 1 10 30 6 21 
4049 8 21 5 24 1 4 27 18 
30-39 16 6 21 2 4 14 1 13 19 2 21 
20-29 16 29 3 1 36 3 18 34 6 1 3 14 10 
10-19 7 52 15 1 51 3 12127 5 6 1 10 10 
0- 9 1 49 29 9 39 8 4 73 65 5 3 7 5 
—-10-—1 2 30 31 14 41 = 15 7 17 91 += «10 4 1 5 6 
—20-—11 9 22 19 44 27 1 1 1 14 OI 14 4 3 
—30-—21 16 30 25 22 1 6 12 
—40-—31 9 35 9 41 1 9 2 
—50-—41 2 40 7 36 8 28 
—60-—51 23 16 30 
—70-—61 3 2 21 
—80--—71 27 
—90-—81 4 


N 51 175 127 175 300 175 185 175 252175 64 17599 175 65 175 
Mdn. 30 10 —5 —34 6 —32 96 82 13 —2 —12 —48 74 51 91 57 


No of 

Items in 131 144 168 226 66 212 168 211 
Key 

Per Cent 

Overlap* 3.4 10.3 6.3 6.3 2.0 7.4 14.3 4.6 





* Per cent of distribution of scores of tradesmen-in-general exceeding median of the 
distribution of scores of members of a trades group. 


or is there another explanation? The writer believes that the use of a 
constant eleven per cent difference for deciding to use or not use a given 
differential response in the scoring key does not give maximum separa- 
tions between criterion and control groups. This factor, plus the fact 
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that multiple weights were not used in the present keys, but are used by 
Strong, operates to increase percentage overlap. 

Inspection of Table 2 reveals that the eight keys differ not only 
in their power to separate members of a trade from outsiders, but also 
in the kinds of distributions which they produce. One is immediately 
struck by the differences in the median raw scores attained by workers 
on their own occupational keys. Electricians have a median score of 
plus 96, while bakers have a median score of minus 12. It is easy enough 
to see how high plus scores are attained, but why should a group get 
& minus score on its own key? The answer is to be found by studying 
the percentage responses of bakers to the items which differentiate them 
from tradesmen-in-general. These item responses are generally un- 
popular items. Tradesmen-in-general select them only 10 to 20 per cent 
of the time; even bakers select each item, on the average, less than 
50 per cent of the time. This is a rather interesting finding, since it 
indicates that interest patterns may operate to differentiate occupational 
groups not only by use of items selected by an overwhelming majority 
of a group, but also by use of items actually rejected by a majority of a 
group. 

The differences in variability of the distributions of scores reflects 
to a considerable extent the number of responses included in the scoring 
key. Since there are 570 items in the inventory, and since either re- 
sponse of like or dislike is scored, a total of 1140 responses are scorable. 
Whereas a large number of items differentiated electricians from non- 
electricians (226), only a small number of items did the same job for 
painters and non-painters (66). This difference is undoubtedly due, in 
part, to the kinds of items included in the inventory. It may also be 
due, in part, to real differences in the degree to which workers within an 
occupational group resemble each other. It is possible that painters, as 
an occupational group, have fewer basic interests in common than do 
electricians. 

A small dispersion of scores is usually associated with lower reliability, 
which suggests that the keys with less variability are also those with high 
percentages of overlap between criterion and control groups. A com- 
parison of the percentage overlap with the number of items in the key, as 
given in Table 2, gives no support to this notion. The present data 
indicate that a small dispersion of scores is not, in itself, undesirable. 


Relationships Between Scoring Keys 


Table 3 presents the correlations between scores on each of the eight 
keys obtained for the sample of 200 tradesmen whose inventories were 
scored on all keys. These correlations range from high positive to high 
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negative values, giving the impression that the scores on various keys 
tend to cluster in rather meaningful patterns. Thus, the interests of 
milk wagon drivers and bakers seem to have much in common, as do the 
interests of electricians, sheet metal workers, and plumbers. The in- 
terests of painters, on the other hand, have little relation to those of any 
of the other seven groups. 

The small number of occupations involved makes it difficult to de- 
termine to what extent these clusterings result from the sampling of 
occupational groups in the study, and to what extent they result from real 
similarities of interest of the different workers. It is easy to see how 
milk wagon drivers and bakers would appear very much alike when 
compared with workers in the building trades; whether or not this same 
degree of relationship would hold if the sample of occupations were larger 


Table 3 


Intercorrelations of Scores on Eight Occupational Keys 
for a Sample of 200 Tradesmen* 








Milk Sheet 
Plas- Wagon Elec- Metal 
terers Drivers Printers tricians Painters Bakers Workers Plumbers 





Plasterers —.11 — .60 12 .30 —.21 38 .26 
Milk Wagon Drivers 45 —.78 .03 84 —.77 — .78 
Printers — .68 — .06 58 —.81 —.74 
Electricians —.19 — .83 85 85 
Painters 04 —.12 —.13 
Bakers — .85 — .86 
Sheet Metal Workers 89 
Mean 11.2 -286 —-241 565 -0.5 -—419 49.3 56.4 

Standard Deviation 13.35 19.00 22.85 32.89 7.80 26.17 25.56 28.14 





*25 men from each of the eight occupations were scored on all eight keys. Thus, 
each key is scored for 25 members and 175 non-members of the given occupation. 


and more heterogeneous is not clear. It is fairly certain that the actual 
separations of workers from outsiders achieved in this study would have 
been considerably more spectacular if a wider diversity of occupations 
had been included. The degree to which electricians, plumbers and 
sheet metal workers cluster, as do bakers and milk wagon drivers, tends 
to obscure the marked differences between the two clusters and the 
remaining occupations. 

Another method of portraying the clustering of occupations is used in 
Table 4, in which the median percentile score of each of the eight groups of 
25 is given for two keys—the electrician’s key and the milk wagon driver’s 
key. Percentile scores are computed on the distributions of scores of 
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Table 4 


Median Percentile Scores* on Each Occupational Key Attained by 25 Electricians 
and 25 Milk Wagon Drivers 








25 Milk Wagon 25 Elec- 
Key Drivers tricians 





Plasterers 8 
Milk Wagon Drivers 48 
Printers 

Electricians 

Painters 

Bakers 

Sheet Metal Workers 

Plumbers 





* Percentile scores are computed using the distribution of scores of each group of 
employed workers on its own occupational key; that is, the distributions of scores for 
group A listed in Table 3. 


workers in the occupation. The close relationship between the milk 
wagon driver’s and the baker’s key is indicated when the median score 
of milk wagon drivers on the latter key is at the 26th percentile of the 
baker’s distribution of scores. The same sort of clustering occurs be- 
tween electricians, sheet metal workers, and plumbers. 


Summary 


The present report has analyzed the interests of members of eight 
A. F. of L. trade unions. When scoring keys are prepared to differentiate 
between members of a trade group and a composite group of tradesmen- 
in-general, it is found that: 


1. Workers in a trade can be separated from workers in other trades 
on the basis of their measured interests with considerable success. About 
six workers out of a hundred will exceed the median score of tradesmen 
in an occupation other than their own. 

2. The separation is achieved with a rather crude criterion for pre- 
paring scoring keys: that the response of the one group differ by eleven 
percentage points or more from the response of the composite tradesmen- 
in-general group. 

3. Distributions of scores on the different scoring keys vary consider- 
ably both in central tendency and in variability, but these values are 
not closely related to the goodness of the keys, as defined by the degree 
of separation between workers in and not in the trade. 

4. Correlations between scores on the eight keys indicate a clustering 
of trades with respect to measured interests. Workers in three unions 





Vocational I nterest Test at the Skilled Trade Level 303 


related to the building trades (electricians, plumbers, and sheet metal 
workers) tend to have related interests, but to differ markedly both from 
workers in two service occupations (milk wagon drivers and bakers), 
and from workers in two other building trades (painters and plasterers). 

5. The data analyzed thus far seem to suggest that skilled trades 
groups may be ordered into families of occupations with rather similar 
interests, so that it may not be necessary or desirable to differentiate 
between closely related occupations either in preparing separate scoring 
keys, or in the guidance of young persons contemplating entry into these 
fields of work. However, this aspect of this program of research requires 
considerably more work than has been completed thus far. 


Received December 16, 1948. 





A Selection Battery for Bake Shop Managers * 


Edwin B. Knauft 
Federal Bake Shops, Inc., Davenport, Iowa 


A number of investigators (1, 3, 6, 11, 16, 18) have attempted to 
develop and validate series of items or test batteries which would effi- 
ciently predict executive or supervisory job success. Some of these 
studies were moderately successful, but the majority reported “validity” 
data which were based only on the original population used in the stand- 
ardization or item analysis of the tests. It is generally recognized that 
the abilities or characteristics contributing to success variance in man- 
agerial positions are difficult to isolate and measure. The problem is 
further complicated by the fact that it is often impossible to obtain a 
relevant and reliable criterion of supervisory job success. In addition, 
validation is difficult because it is unusual for a large number of super- 
visors or managers to be engaged in the same or similar job duties. 

The objective of the present study is to construct and attempt to 
validate a series of written tests which will predict subsequent on-the-job 
behavior of shop managers in a retail—manufacturing bakery chain. 
Seventy-nine managers of bake shops were available for the initial re- 
search, 85 managerial applicants were used in the subsequent develop- 
ment of test norms and 33 new managers were followed up on the job 
and formed the cross validation study. 


The Bake Shop Manager 


The managers were employed by a chain which operates 88 retail- 
manufacturing bake shops in the Midwest, East and South. Each shop 
is under the supervision of a manager who directs both the manufacture 
of bakery products from raw materials and the sale of the products to the 
public. 


The principal duties of the manager may be summarized as follows: (1) pur- 
chases raw materials; (2) directs and sometimes participates in the manufacture 
of baked products from these raw materials; (3) determines the variety of 
products and quantity of these products which shall be produced each day; 
(4) computes the cost and determines the selling price of each product; (5) 
hires, discharges and supervises the work of bakers, baker apprentices, sales- 


* This paper is based on a thesis submitted in partial fulfillment for the Degree of 
Doctor of Philosophy at the State University of Iowa. The writer acknowledges his 
indebtedness to Professors Dewey B. Stuit and Harold B. Bechtoldt for their helpful 
advice. 
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girls and porters; (6) keeps financial records, pays employees and writes checks 
for raw materials purchased; (7) sends financial, sales and inventory reports 
to the home office; (8) works under the general supervision of a district manager. 


The Criterion Problem 


Since each manager has primary responsibility for the successful 
operation of his unit, it is reasonable to presume that the general financial 
condition of the unit—in terms of profit and loss—would reflect the 
abilities of the manager. The actual profits of each unit, however, are 
not a satisfactory measure of managerial ability because certain expenses 
not under the manager’s control affect the profits. The standard com- 
pany accounting procedures divide the expenses of each unit or shop into 
controllable and uncontrollable costs. Variables largely under the con- 
trol of the manager are grouped together and are known as total con- 
trollable costs. The actual dollar volume of these costs is partly a 
function of the total sales of a unit, and hence a direct comparison of those 
figures from unit to unit would place the manager of a small unit at con- 
siderable disadvantage. For this reason, the ratio of this cost to the 
total sales of the unit is computed and affords a unit to unit comparison. 
This measure, designated hereafter as total controllable cost, is one possible 
criterion measure of managerial ability. 


Data on the total controllable costs were obtained for all units from the 
1946 Company operating statements. The data were first analyzed by dis- 
tricts and it was found that there were rather large differences between the 
means of certain districts. An analysis of variance was made of these district 
data to test the hypothesis that the district means varied from each other only 
by chance. The resulting F value of 4.92 is significant at better than the 1% 
level of confidence, indicating that these differences may be due to factors 
other than chance. It therefore seemed reasonable to use the controllable 
cost ratios as a measure of individual manager performance only after these 
data had been corrected for the ‘‘district effect.”’ A district correction factor 
was applied to the 1946 cumulative cost percentages of each unit. The base 
for this correction factor was the difference between the 1946 controllable 
cost for the entire company and the corresponding value for the given district. 

The relationship between length of time in a managerial position and the 
above corrected measures was investigated to determine the effect of experience. 
It was found that nine men who had been managers from three to six months 
did not have a mean corrected controllable cost percentage which was signifi- 
cantly different from the mean of similar measures for 61 managers who had 
been on the job for more than one year. It therefore seemed reasonable to 
include in the original study all managers who had been on the job for three 
months or more. 

The corrected reliability of the corrected controllable cost data, by units, as 
estimated from the values of odd vs. even months, was .96. 

Several members of the management of the Company felt that the raw 
materials cost should be given more weight in a composite criterion than was 
reflected by the actual contribution of raw materials to total controllable cost. 
There was available a raw materials percentage (ratio of dollars spent for raw 
materials to dollar sales of the unit) which reflects the manager’s ability to 
buy raw materials wisely, to prevent waste during production and, indirectly, 
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to correctly set the selling prices of his products. The raw materials percentage 
for each unit for 1946 was used as a second criterion measure. These data 
were analyzed by districts in the same manner as the total controllable costs 
and again the analysis of variance yielded an F value (3.14) which was signifi- 
cant at better than the 1% level of confidence. A correction factor computed 
in a manner similar to that described above for controllable costs, was calcu- 
lated on the basis of the difference between the raw material percentage for 
each district and the raw material percentage for the entire company. The 
correction factor for each district was then applied to the raw material per- 
centage of each unit in the given district. The reliability of the raw material 
measure was estimated by correlating the corrected unit data of odd months 
of 1946 with even months. The resulting corrected coefficient was .92. These 
reliability data were based on 63 units in which there was no change in managers 
during the year. 

A Subjzctive Rating of Performance. The manager’s job is so complex that 
many aspects of mangership probably are not directly reflected in the two 
criterion measures which have been discussed. Some type of merit rating 
procedure appeared to be the only technique which would measure those 
managerial qualities not reflected in financial data of the units. A survey 
of various types of personnel merit rating methods (9) led to the conclusion 
that the weighted check-list type of rating scale would be appropriate in the 
present situation. This technique, involving the equal appearing intervals 
method of Thurstone (17), was first applied in a merit rating situation by 
Richardson and Kuder (14). 

The procedure used to construct a weighted check-list rating scale for bake 
shop managers has previously been reported in detail (10). This scale, which 
was comprised of two forms of 24 items each, was used to evaluate the 79 
managers in the initial study. The basic rating data were obtained from the 
evaluations resulting when each district manager applied both forms of the 
scale to his unit managers. The product moment correlation of scores obtained 
on the two forms of the scale was .79. The reliability of the combined scale 
consisting of both forms was estimated by the Spearman-Brown formula to 
be .88. Additional data were available on 35 of the managers who were also 
rated by their respective assistant district managers. The reliability coeffi- 
cient of the scale, based on the ratings of the 35 managers by two superiors 
was .81. 

The rating used as a criterion measure for each manager was the mean 
of the scores received on the two forms of the scale. The rating scores assigned 
by the district managers were subjected to an analysis of variance to determine 
if there were significant differences between the mean ratings made by the 
various district managers. The resulting F value of 2.15 was not significant 
at the 5% level of confidence, indicating that the differences in mean ratings 
may be attributed to chance alone. 

The effect of job experience on the rating score was checked by comparing 
the mean rating scores obtained on nine managers with three to six months’ 
experience with scores of men who had managed for more than one year. In 
the absence of a significant difference between these groups, using the ¢ test 
for small samples, it appears that within the time limits studied, variation in 
experience is not associated with the average rating score. 


Combination of the Criterion Measures. It is first necessary to examine 
the comparability of the three criterion measures in terms of their 
respective means and standard deviations. These data, together with 
the reliability estimates for the three measures, are summarized in Table 
1. These figures are based on data corrected for the district effects. 
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This table indicates that the three measures are not directly comparable 
in their present form because they do not have equal variances nor equal 
scale units. 


Table 1 


Summary Data of the Criterion Variates 








Reliability 
Criterion Estimate Mean $.D. 





Total controllable costs .96 68.1 3.3 
Raw material costs 86 37.5 1.9 
Rating score 88 5.5 0.7 





The intercorrelations between the measures presented in Table 2 
indicate the extent to which the criterion variates overlap. In inter- 
preting this table it should be noted that controllable costs and raw 
material costs are experimentally dependent, and even the rating score 
may not be entirely independent of the above two because the rater’s 
knowledge of a manager’s operating data might influence some of the 
rater’s responses to the rating form. The raters, however, did not see any 
operating data after district corrections had been made. The inter- 
correlations of Table 2 suggest that these variables may have a con- 
spicuous common element which we may call managerial success or 


ability on the job. On the supposition that a common factor is being 
measured, it is possible to combine these three measures into a composite 
criterion. The individual measures on each variable were converted 


Table 2 


Intercorrelations of Criterion Variates 








Comparison r 





Controllable costs vs. raw material costs .65* 
Controllable costs vs. rating score 33 
Raw material costs vs. rating score Al 





* This coefficient of .65 may be regarded only as an approximation of the true 
correlation because of two conditions: (1) raw material costs are actually a portion of 
total controllable costs and hence the two measures overlap, and (2) raw material 
costs and total controllable costs are both ratios which have the same denominator, 
viz., total sales. Correction for the former source of error may be made by Peters 
and Van Voorhis formula for the correlation between overlapping arrays (13, p. 215- 
217). An application of this correction here yields a value of .10 which seems unreason- 
ably low because it is based upon the assumption that the second effect mentioned is 
negligible and that the two ratios have a zero correlation. This second effect can be 
checked by a partial correlation technique which is inappropriate here because of the 
small number of cases. 
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into normalized standard scores, using the percentile method, and the 
three standard scores were averaged for each individual manager. This 
procedure gave equal nominal weights to each of the three variables, but 
the effect of the intercorrelations and the variance relationship between 
total controllable costs and raw material costs actually gives a greater 
effective weight to raw material costs. Because of the greater impor- 
tance attached to this latter variable by top management, the weighting 
used here is in the desired direction and appears to yield a satisfactory 
combined criterion measure. 


Preliminary Test Battery 


The preliminary test battery was assembled and administered to all 
bake shop managers. Those managers who had three or more months 
experience formed the criterion group which was used in the evaluation 
and item analyses of the several tests. Following is a brief description of 
these tests and preliminary results obtained from them. 


General Mental Ability. The research of previous investigators (1, 6, 12) 
indicates that there is some positive relationship between scores on short 
mental ability tests and job success in certain managerial or supervisory 
positions. The Wonderlic Personnel Test, a 12 minute revision of the Otis 
S-A Test, scored by taking the number of correct responses made in the 12 
minute time interval and correcting the score for age by the use of Wonderlic’s 
table (19), was included in the present battery. The scores of the population 
of 79 managers ranged from 7 to 34 with a mean of 19.8 and an S. D. of 6.5. 
A significant relationship between tests score and educational level is revealed 
by a correlation of .39. The correlation of test score with the composite cri- 
terion was .13, while a correlation of .20 was obtained between test score and 
size of unit the manager operated. The latter value just fails to be significant, 
for an r of .22 is required for the 5% confidence level for this size of sample. 

Preference, Interests and Attitudes. Previous investigators agree that the 
personality of the manager or supervisor is one of the most important single 
elements contributing to job success. Tests in this area have thus far been 
rather unsuccessful as selection instruments, partially because such paper 
and pencil inventories can often be ‘‘beaten” by the applicant. In the selection 
situation the applicant’s motives may lead to responses which he thinks will 
help him obtain the position. A second shortcoming of the usual “‘personality 
test” is that it is scored in terms of a number of traits such as dominance, intro- 
version, frankness, etc. Since it cannot be precisely determined if these traits 
are required for success on a given job, the industrial validation of such a 
“test” 1s a difficult and often impossible procedure. 

Jurgensen (7) has recognized the shortcomings of the common types of 
personality inventories when used as selection instruments, and has constructed 
a device which he believes possesses distinct advantages in the industrial 
situation. He has utilized the forced choice technique of item arrangement 
which has also been used in the selection of army officers (15). The three 
principal advantages of Jurgensen’s Classification Inventory are that: 1. the 
applicant is generally not able to predict the “right”’ answers when attempting 
to secure a job; 2. the test is scored and validated on a specific job and a 
scoring key developed for the job in a given company; and 3. no hypothetical 


“trait scores” are necessary because each item can be correlated separately 
with the criterion. 
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A tentative scoring key for bake shop managers was developed in accord- 
ance with the procedure recommended by Jurgensen (8). The criterion 
population of 79 managers was split into two groups which comprised the 
highest 27% and lowest 27% in terms of the composite criterion. The scoring 
keys were based on items which differentiated between the high and low 
criterion groups at the 10% level of confidence or better. The resulting 
scores of the managers correlated .64 with the composite criterion, but this 
value must be interpreted with extreme caution because 54% of this popula- 
tion was used in the construction of scoring keys for this test. 

Job Information. In most instances, managers or supervisors are selected 
from existing employees and are expected to possess some knowledge of the 
production work or technical specialty performed by the persons they will 
supervise. Bake shop managers are expected to possess considerable baking 
experience and they must be able to recognize why products are below standard. 
In addition, it seemed desirable to know the amount of baking information 
possessed by a manager-candidate so that the training of the individual could 
be arranged accordingly. A test was constructed in which the majority of items 
were directed towards measuring the “‘diagnostic baking sense’’ of the manager. 
The following multiple choice item is an example: 

Which one of the following conditions is most likely to result in dull crust 
color on Danish pastry? A. Not enough dusting flour; B. Underproofing; 
C. Low egg content; D. Old dough. 

The Baking Knowledge Test consisted of 46 items. All 79 managers in 
the original population were tested and a correlation of .16 was obtained 
between test score and the composite criterion. The test was then subjected 
to an item analysis in order to determine which items differentiated between 
good and poor managers. The highest and lowest 27% of the criterion group 
were again used for the item analysis. When the Baking Knowledge Test 
was rescored, using only 12 items which met the criterion of discrimination, 
the scores correlated .37 with the composite criterion of managerial ability. 
This figure must be regarded as a spuriously high validity estimate because 
54% of this population was used in item analysis of the Baking Knowledge 
Test. 

Judgment in Managerial Problems. A number of items were constructed 
which represent some of the decisions and judgments a bake shop manager 
must make. The items were arranged in multiple choice form, but the respond- 
ent is required to select the first, second and third best choices from the 
alternatives in each item: 


Assume your best selling item is a pecan ring that sells for 35¢ each. This 
item accounts for 10% of your sales. Pecans are selling for 40¢ per pound. 
Then the price of pecans jumps to $1.25 per pound because of crop failure. 
Would you— 


A. Stop making pecan rings and try to build up sales on another item; 

B. Increase the price of pecan rings to 60¢ because the material cost 
will be in line at this price; 

C. Leave the price the same and try to make up your loss by increasing 
prices a little on other items; 

D. Use only one-third as many pecans as before and leave the price the 
same. 


Items of this type have been grouped together as the Federal Management 
Test. A rank order of several choices per item was required because Cardall 
(4) found the second or third choice responses to an item are sometimes more 
discriminating than the first or “best”? choice. The item analysis and scoring 
key were based on the responses of the top and bottom 27% of the criterion 
group. 
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The scoring key was constructed by computing the percentage of “high” 
and “low” criterion managers responding to each item alternative as a first, 
second or third choice. This analysis yielded 12 items containing one or more 
responses which successfully differentiated between the high and low criterion 
groups. Scores on this test correlated .52 with the composite criterion. 

A second test was constructed which was designed to measure managerial 
judgment in a specific context. One of the important daily duties of the 
manager is that of ordering the quantity and selection of baked goods to be 
produced foreach day. This requires an accurate estimate of expected volume 
of business on the following day so that the shop will not be sold out before 
closing time and, conversely, that few or no items need be carried over as 
“stales.” The Bake Order Problem was constructed on the assumption that 
a hypothetical store under given weather conditions could serve as a basis for 
measuring an individual’s ability to use correct judgment in making out the 
order. After this problem was administered to the criterion population, 
responses to different portions of the problem were analyzed to determine if 
there were significant differences between the high and low 27% criterion 
groups in terms of quantity of each item ordered and variety of items ordered. 
A total of 13 analyses were made on different portions of the problem, but all 
results were negative and this problem was omitted from the revised battery. 

Biographical Data. As early as 1922 Goldsmith (5) found that biographical 
information items were helpful in the selection of insurance salesmen. Uhr- 
brock and Richardson (18) and the Army (15) have both used such items in 
personnel selection batteries. Personal data were collected in the present 
study by means of a Biographical Information Blank which included 33 items. 
Since these items were first being used on present managers, it was necessary 
for the testee to answer each item as it applied when he first became a manager. 
For example: 

How many of the following did you own or were you buying just before 
you became a Federal manager? (Mark as many answers as apply): A. Stocks 
or bonds ($100 to $300); B. Stocks or bonds (more than $300); C. A house; 
D. Home furnishings; E. A bakery; F. A car. 

An item analysis was performed on the 33 biographical items using the high 
27% and low 27% of the criterion groups responding to each alternative. 
The resulting response frequencies for most item alternatives were extremely 
small and consequently the validity estimates were unstable. It was also 
found that only a small number of items discriminated between the two 
criterion groups. For these reascas the Biographical Information Blank was 
omitted from further consideration. 

Name and Number Checking. The typical bake shop manager spends about 
one hour per day on report work and simple bookkeeping. In the light of 
these activities the Minnesota Clerical Test was included in the preliminary 
battery. The results obtained indicated a lack of correlation between responses 
on this test and the composite criterion. The correlation was .19 between 
“numbers” score and the criterion and —.15 between “‘names”’ score and the 
criterion. The 79 managers obtained a mean score of 108.7 and an §. D. of 
30.3 on the Numbers Test and a mean of 92.5 and 8S. D. of 28.5 on the Names 
Test. These mean scores are rather low when compared to norms for male 
clerical workers reported by Andrew and Paterson (2). It may be hypothesized 
that the managers in the present study do not perform enough clerical work 
or can do this work at their own pace in a manner which is not identical with 
a perceptual speed factor which probably operates in the Minnesota Clerical 

est. 


The Revised Battery 


The following tests were included in the revised battery: Classification 
Inventory, Baking Knowledge Test, Federal Management Test and 
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Wonderlic Personnel Test. This battery was administered to a new popu- 
lation of 85 manager applicants already in the employ of the company as 
bakers. This population was used to establish tentative norms for the 
tests. Thirty-three of these applicants were selected for manager train- 
ing and subsequently became managers. These men formed the cross 
validation population. In addition, 23 present managers who had taken 
the preliminary battery were retested on the revised battery to furnish 
reliability data on the various tests. 

The population of 85 applicants had a mean age of 32.8 years, a mean 
education of 10.0 grades and a mean of 11.2 years civilian baking ex- 
perience. Corresponding data for the original group of managers indi- 
cates a mean age of 42.7 and mean education of 10.0 grades. 

Reliability of the Battery. Reliability data are based on 23 managers 
who were retested seven months after the original testing. The reli- 
ability estimates for the battery are presented in Table 3. 

The low reliability of the Federal Management Test casts doubt on 
its usefulness. 


Table 3 
Test-Retest Reliability Estimates for Tests in the Revised Battery (N = 23) 








Test Reliability (r) 





Classification Inventory .78 
Baking Knowledge 77 
Federal Management 46 
Wonderlic Personnel Test (Forms A and B) 85 





The mean scores obtained by these 23 men on test and retest sessions 
were analyzed to determine if any significant shifts had occurred. It 
was found that there was no significant change in group mean scores for 
the Classification Inventory and Federal Management Test. However, 
mean scores on both the Baking Knowledge Test and Personnel Test 
increased significantly. These differences were significant at better than 
the 1% confidence level. It is possible that familiarity with the testing 
situation, positive practice effect and memory may have accounted for 
the increase in scores on the latter two tests. 

Intercorrelations of Test Scores. One of the interesting findings ob- 
tained from the applicant group was a revised set of intercorrelations of 
the several tests. The test intercorrelations from the criterion population 
and from the applicant population are presented in Table 4. This table 
shows that the original intercorrelations on all tests except the Personnel 
Test were much higher than the values obtained from the applicant 
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population. This finding might be anticipated because the Personnel 
Test was the only one of the four tests which was not item analyzed or 
scored on the basis of the responses of the original population of managers. 
None of the intercorrelations involving the Personnel Test shifted 
markedly on the new population, whereas the other values show a definite 
decrease for the applicant group. Such results accent the fact that 
correlational values obtained from a population which is used in the con- 
struction or item analysis of tests will generally be spuriously high. 


Table 4 


Intercorrelation of Tests 


Note: Superior values in each cell are based on 85 applicants. 
Values in parentheses are based on criterion population. 








Baking Federal Personnel 
Knowledge Management Test 





Classification Inventory —.16 — .02 38 
(.44) (.48) (.28) 


Baking Knowledge 04 17 
(.41) (.19) 

Federal Management —.10 
(.00) 





The only significantly positive intercorrelation for the applicant 
population is between the Personnel Test and the Classification In- 
ventory. Thus these two measures are somewhat dependent, although 
the size of the correlation does not indicate a marked “overlap.”” Only 
the Baking Knowledge Test correlated significantly (r = .45) with num- 
ber of years of baking experience. None of the tests correlated signifi- 
cantly with age and only the Personnel Test correlated significantly with 
number of years of education (r = .43). 

Validity Data. Thirty three of the 85 applicants were appointed as 
unit managers. These men were given a Company manager training 
program before being assigned to a managerial position. The 33 men had 
actually been managing for an average of 8.8 months when the foilow-up 
study was conducted and criteria of their job performance were collected. 
Criterion data were obtained on these men by the same procedures as 
were used in the original study and the composite criterion used in the 
follow-up study is identical in composition with that used in the original 
study. 

It should be pointed out that the district manager (who was the rater) 
knew the new manager’s test scores in ten out of the 33 cases. This 
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possibility of criterion contamination is probably slight, especially because 
the ratings constitute only a portion of the composite criterion. 

The validity coefficients of the various tests in the revised battery 
are presented in Table 5. Correlation coefficients are not very appropri- 
ate measures for such a small sample. In the present sample the Classifi- 
cation Inventory has a coefficient which is significant at better than the 
5% level. None of the other coefficients in Table 5 are significant. 

An alternate method of estimating the validity of the several tests is 
to compare the test scores of men who had high criterion scores with the 
scores of men who received low criterion scores. For this analysis the 
population of 33 managers was divided into the best 16 and poorest 16 
on the basis of the composite criterion measures. The remaining case 
was omitted from this analysis. The “high” group of managers made 
significantly higher scores than the “low” group on two of the tests—the 
Classification Inventory and the Personnel Test. In both cases, the ¢ 
values for the differences between mean scores were significant at better 
than the 5% confidence level. The Baking Knowledge Test and the 
Federal Management Test failed to differentiate between good and poor 
managers. 


Table 5 


Correlations Between Test Scores and Criterion (N = 33) 














Test Validity Coefficient 





Classification Inventory 39 
Baking Knowledge —.12 
Federal Management .06 
Wonderlic Personnel Test .26 





It was decided to eliminate the Federal Management Test from further 
use as a selection device because of its lack of validity and its low reli- 
ability coefficient (r = .46). Although the Baking Knowledge Test did 
not appear to be a valid predictor of managerial success, this test was 
retained in the battery because it would be useful in determining the 
amount of baking training the applicant would require before he was 
installed as a manager. 

The scattergrams of test scores against criterion measures for the 
Classification Inventory and Personnel Test were inspected in an attempt 
to set cutting scores for these tests. It was found that a cutting score 
of 16 on the Personnel Test would have eliminated 44°% of the “low” 
managers (poorest half on the criterion) but would have eliminated only 
6% of the “good” managers. This raw score of 16 is equivalent to a 
percentile rank of 25, as determined from the population of 85 applicants. 





314 Edwin B. Knauft 


Similarly, a passing score of 25 (percentile rank of 48) on the Classification 
Inventory would have eliminated 50% of the ‘‘poor’”’ managers and 36% 
of the “good” managers. If these two tests were used together and both 
of the above cutting scores had been used in combination, 63% of the 
“poor” managers would have been rejected as opposed to 36% of the 
“good” managers. On the basis of these combined cutting scores, 45% 
of the 85 applicants would have been considered as acceptable for mana- 
gerial training. However, these cutting scores should be considered as 
tentative until they are validated on a second independent sample. 


Summary 


A study has been made of the prediction of managerial success in a 
retail-manufacturing bakery chain. An empirical approach has been 
used to select tests and to select and weight items on the basis of the 
responses of a criterion population of 79 managers. Three criterion 
measures were obtained on these managers and these were combined into 
a composite criterion score for each manager. 

A preliminary battery of seven tests was administered to the 79 
managers. A subsequent item analysis suggested that a revised battery 
be assembled. This contained the Baking Knowledge Test, the Wonderlic 
Personnel Test, the Classification Inventory and the Federal Manage- 
ment Test. This revised battery was administered to 85 applicants for 
managerial positions. New test norms, intercorrelational data and reli- 
ability data were obtained from this population. All tests in the revised 
battery except the Federal Management Test had acceptable reliability 
coefficients. 

The validity of the battery was estimated by comparing the test 
scores and criterion scores of 33 of the applicants who subsequently be- 
came managers. Only one of the tests—the Classification Inventory— 
had a validity coefficient which was significantly different from a zero 
correlation at the 5% level of confidence. A comparison of the mean 
test scores of the upper and lower halves of this group, based on the 
criterion, indicate that both the Wonderlic Personnel Test and the 
Classification Inventory significantly differentiated between the good and 
poor managers. The Federal Management Test and the Baking Knowl- 
edge Test lacked validity, but the latter test was of value in determining 
how much additional baking training was required by the individual. 
Received December 6, 1948. 
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A Note on Mechanical Aptitude of West Texans 


Albert Barnett 
Texas Technological College 


Two tests claiming to measure mechanical aptitude have been rather 
widely used at Texas Technological College. For a number of years, 
freshmen, as part of an orientation program, were given the Revised 
Minnesota Paper Form Board, a paper-and-pencil test requiring the 
testee to combine in his imagination a few disarranged geometrical plane 
figures to form one large figure and select the correct answer from among 
four or five suggested solutions. The Minnesota Spatial Relations test 
requires that the testee fill a number of irregular holes in each of four 
form-boards with the appropriate cut-out blocks, no two of which are 
alike, the score being the number of seconds required to complete the 
task. This test kas been used for some time on an individual basis at 
the Texas Tech. Guidance Center. 

It is evident that neither of these tests places much, if any, emphasis 
on mere hand skills, but on the mental factor of space relationship, which, 
it is claimed, accounts in part for mechanical aptitude. 

During the fall semester of 1941, the Revised Minnesota Paper Form 
Board AA, was run on 371 freshmen (mainly from West Texas) of the 
Arts and Science Division of Texas Technological College. Their mean 
chronological age was between 18 and 19 years, the approximate range 
being 15-23 years. Their median score was 42.5 equivalent to the 
70%ile on the norms of liberal arts freshman men, whose median score 
was 38. The fact that these young freshman liberal arts boys, on the 
average, excelled 70 per cent of the standardization group in mechanical 
aptitude was merely noted, but not explained. 

During several months in 1947-1948, a record was made of the scores 
of men on the Minnesota Spatial Relations Test coming to the Texas. 
Tech. Guidance Center for vocational advisement. These men, mainly 
in their twenties and thirties, came from several different West Texas 
counties, and represented every educational level from the illiterate to the 
college graduate. Each man was tested individually by a trained psycho- 
metrist. Results are shown in Table 1. It may be noted that the mean 
time required by this sample of 383 men to complete the test was 973.5 
seconds, which compares to the mean (apparently) of 1279 seconds for 
the norms furnished by the publishers of the test. The difference be- 
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Table 1 


Minnesota Spatial Relations Test Results 
(383 Men, Texas Tech. Guidance Center) 








Seconds Required for 
Completing Test 


f 


Percentile Standard Score 





500-599 
600-699 
700-799 
800-899 
900-999 
1000-1099 
1100-1199 
1200-1299 
1300-1399 
1400-1499 
1500-1599 
1600-1699 
1700-1799 
1800-1899 
1900-1999 


3 
16 
59 
87 
74 
63 
29 
21 
11 

6 
10 


1 
1 
2 26 


383 


99.19 
96.72 
86.97 
67.99 
47.06 
29.25 
17.29 
10.79 
6.63 
4.42 
2.34 
1.04 
91 
65 


6.90 
6.45 
6.00 
5.55 
5.11 
4.66 
4.21 
3.76 
3.31 
2.86 
2.41 
1.96 
1.52 
1.07 
0.62 





M = 973.5 


ew = 11.4 o 


= 222.31 





Table 2 








Letter Mid-Sigma 


Rating 


Score 


Norms for Men Compared to Achievement at the Texas Tech. Guidance Center 





Time in Seconds for all Four Boards 





Texas Group 


Published Norms 





A 
B 
C 
D 
E 


7.0— 
6.0 
5.0 
4.0 
3.0— 


Up to 639 


640-861 
862-1085 


1086-1307 
1308 and above 


Up to 936 
937-1131 
1132-1427 
1428-1934 

1935 and above 





tween the Texas group and the norm group is revealed in Table 2, which 
shows the score range equivalent to the letter ratings on the test for the 
Texas group compared to the published test norms.! 

As yet, no satisfactory explanation has been found for this superority 
(as tested) of West Texas men in mechanical aptitude. It is true that 
the region from which the Texas group came is one of mechanized 


1 Minnesota Spatial Relations Test: Examiner's Manual (Minneapolis: Educational 


Test Bureau), p. 3. 
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farming. Most of these men had been accustomed to tractors and other 
machines since boyhood. Some worked with machines in the oil fields of 
this region. The Spatial Relations Test, however, is supposed to be, as 
stated in the Examiner’s Manual, “relatively free from the influence of 
previous mechanical experience.”’ 

As stated previously, the men who were tested at the Texas Tech. 
Guidance Center were young men. It is not known whether or not they 
were as a group younger than the standardization group. As a check 
on the possible influence of age on test performance, the Texas group 
was separated into two discrete sub-groups, namely; those requiring 
one thousand seconds or more to complete all four boards of the test and 
those who completed the test in eight hundred seconds or less. The 
former group had a mean age of 26.7 years as compared to 24.2 for the 
latter, the standard error of the difference being .66 and the critical ratio 
3.76. While it is true that the poorer performance is associated with the 
older group, there is much over-lapping. Furthermore, it is possible that 
among the older men, those who had failed to adjust occupationally be- 
cause of poor mechanical aptitude, tended to present themselves for 
testing and guidance more than was the case of those who had adjusted. 
Further study needs to be made on the relationship of tested hand skills 
to tested mechanical aptitudes. 


Received December 16, 1948. 





Work Satisfaction and Work Efficiency of Vocational 
Counselors as Related to Measured Interests * 


Salvatore G. DiMichael 
Office of Vocational Rehabilitation, Federal Security Agency 


This article reports another phase of a broad study designed to obtain 
a more complete understanding of personnel engaged as vocational re- 
habilitation counselors for the civilian disabled. A previous article de- 
scribed the experimental study which devoted major attention to a 
determination of the pattern of measured interests and of the relation- 
ships between measured and self-estimated interests for a group of coun- 
selors. It was found that the typical profile of measured interests on 
the Kuder Preference Record was sharply differentiated from the general 
population; that the highest median vocational interest areas were Social 
Service (98 %’ile), Persuasive (82 %’ile), and Literary (65 %’ile) ; that the 
reliability coefficients for the scales ranged from .70 to .89 with an average 
time interval of 5 months between tests; that self-estimated interests 
generally correlated to a substantial degree (median r = .56) with meas- 
ured interests; and that when the counselors had previous knowledge of 
their Kuder results, it did not change the subjective expressions of their 
interests in the direction of the objective preference scores (1). 

In the present report, the experimental investigation primarily deals 
with the possible relationships between the measured interests of voca- 
tional rehabilitation counselors and their work satisfaction, and work 
efficiency. This study sought to determine whether the Kuder results 
could give a basis for predicting varying degrees of work satisfaction and 
of work efficiency among a selected population of counselors who were 
already on the job when the experiment was begun. 

On the basis of a critical review of the experimental literature on the 
Kuder Preference Record, Super states that “the evidence justifies the 
conclusion that the Kuder Preference Record has now been sufficiently 
well standardized and validated for use in vocational guidance. .. . 
More research needs to be done before the Record can be considered a 
well-understood instrument, but it is already a valuable tool in the 
counselor’s kit’’ (6, p. 191). 

Evidence of the validity of the Kuder Preference Record in terms of 
enjoyment and efficiency on the job is, according to Kuder’s 1946 manual 

* The author gratefully acknowledges the assistance given by Donald H. Dabelstein 
in the initial steps of the study. 
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(4), found in only one study. In the latter, Hahn and Williams (3) 
reported significant differences between mean scores on the clerical scale 
for satisfied and dissatisfied workers of three clerical groups of women 
Reservists in the Marine Corps. 


Method 


While conducting orientation institutes for counselors engaged in 
the State-Federal vocational rehabilitation program for physically and 
mentally disabled civilians, the author administered the Kuder Preference 
Record to the trainees. They were assured of the confidentiality of in- 
dividual results and were requested to turn in their interest profiles for use 
in an experimental investigation about the interest patterns of rehabilita- 
tion counselors. Five months later on the average, they were requested 
to retake the Kuder and also to fill out a Survey Sheet which recorded 
their degree of interest with the job of counseling taken as a whole and 
with distinguishable phases of it. At the same time, a prepared Job 
Rating Schedule was sent to each of the counselor’s supervisors who 
were requested to rate the men for efficiency on the job as a whole and 
various phases of it. Of the initial group of 134 counselors, 10 had 
resigned in the meantime and 24 had some of the necessary records 
missing; the remaining number of 100 is referred to collectively as 
Group A. 

Group B was made up of 46 counselors who had never taken the 
Kuder inventory before. They first were requested to fill out the items 
in the Survey Sheet which included questions about work satisfaction. 
Then the Preference Record was administered. In the present study, 
data on Group B enter into the experimental results only on job satis- 
faction. No job efficiency ratings were secured on this group. 


Counselors’ Ratings on Work Satisfaction 


The counselors were asked on the Survey Sheet (graphic rating scale 
method) to rate the degree of their liking for the job as a whole and for 
particular phases of the job (9 items). 

The checked ratings were converted to numerical scores from 0 to 20. 
The results in terms of means and standard deviations for Groups A 
and B are listed in Table 1. By a comparison of the averages of the 
counselors’ ratings on the different scales, it is possible to estimate the 
relative degrees of satisfaction with several phases of their work. The 
highest amount of work satisfaction seemed to be derived from ‘‘inter- 
viewing clients” and the “job as a whole.” Other phases of the work 
which gave high average satisfaction scores were “promoting the program 
to the public,” “contacting employers for jobs,” “reading scientific 
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“able 1 


Self-Ratings of Vocational Rehabilitation Counselors on Degree of Work Satisfaction 
with Their Job as a Whole, and with Particular Aspects of Their Job 





Mean “1” 
Work Satisfaction Scale* Groups** Rating 8.D. ratiot 


Whole Job 








18.8 1.85 2.05 
17.9 2.60 


19.0 1.55 1.18 
18.6 1.97 


Interviewing 


Promoting the Program 16.4 3.79 1.50 


15.2 4.55 


Contacting Employers for Jobs 15.9 3.48 .38 


15.7 3.66 


16.0 3.31 a8 
15.5 3.49 


16.0 3.37 
16.0 3.16 


12.7 4.04 
12.9 5.27 


Reading Scientific Lit. on Rehabilitation 
Experimenting with Guidance Techniques 
Writing Case Histories 


Handling Clerical Details 10.5 4.86 


8.9 5.07 | 


A 
B 
A 
B 
A 
B 
A 
B 
A 
B 
A 
B 
A 
B 
A 
B 


> 


Rehabilitation Work After Business Hours A 12.5 4.50 3.38 
B 9.4 5.29 





* Conversion of ratings to numerical scores was made on basis of a scale of units 
from 0 to 20. 

** N for A = 100; for B = 46. 

+ Values required for statistical significance at 5% level of confidence = 1.98; at 
1% of level of confidence = 2.61 (5, pp. 212-3). 


literature on rehabilitation,’ and “experimenting with guidance tech- 
niques.”’ Less enjoyment was reported from ‘“‘writing case histories’ 
and doing “rehabilitation work after hours.”” The phase of the job 
liked least was “handling clerical details.” 

From an inspection of the frequency distributions on the above items, 
it appeared that the converted scores on the different rating scales were 
not normally distributed. All but one appeared to be considerably 
skewed. The range of scores was very wide on all items except two, 
namely ‘‘whole job” and “‘interviewing.”’ In the latter, the ranges were 
highly restricted to the upper levels of the scales. 

On each of the items dealing with work enjoyment, the differences 
in average self-ratings between Groups A and B appeared slight. How- 
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ever, it was necessary to test the differences statistically in order to be 
able to state definitely that they were or were not due to chance. Ac- 
cordingly, the ‘‘t’”’ ratios were computed and the only significant differ- 
ences between the groups related to the items ‘‘enjoy the job as a whole”’ 
and “enjoy rehabilitation work after business hours.’’ These results 
signify that Group A claimed a higher degree of satisfaction than Group 
B in the job as a whole and in overtime work, and also show that the 
differences in average self-ratings on each of the other items are too small 
to be regarded as statistically significant. 


Work Satisfaction and Measured Interests 


In setting up the study, one important hypothesis to be investigated 
was that certain Kuder Preferences results could be used to predict 
greater enjoyment with various phases of the total job, as well as with 
the job as a whole. Thus, it seemed logical to expect that persons with 
higher scores in the Kuder scales which distinguished the counselors 
from the general population, namely Social Service, Persuasive, and 
Literary, probably would be more satisfied with the job of counseling 
as a whole. It also seemed logical to expect that counselors who came 
out higher on the Scientific scale of the Kuder would experience more job 
interest and satisfaction in experimenting with guidance techniques; 
and that counselors higher in the Literary scale of the Kuder would be 
more apt to enjoy writing up case histories; and that counselors who 
scored higher in the clerical scale of the Kuder would be less annoyed 
with the handling of the clerical details. 

The possible relationships between Kuder scores and work satis- 
faction were determined by computing correlation coefficients between 
variables that logically could be suspected of showing a significant degree 
of relationship. 

Because the scores on job satisfaction did not appear to be distributed 
normally, as noted above, it was necessary to consider the possibilities 
of curvilinear relationships between scores on the Preference Record and 
the job satisfaction scales. Parenthetically, it may be mentioned that 
the scores on the Preference scales appeared to be normally distributed 
with the exception of the Artistic scale, in which the scores seemed to be 
skewed positively. A scatter diagram was prepared for each of the 
paired variables shown in Table 2 for Group A. An inspection of each 
of the diagrams and of the empirical regression lines for the prediction 
of job-satisfaction scores from Kuder scores indicated no curvilinearity. 
A similar analysis on Group B, with a smaller number of cases than 
Group A, did not appear to be warranted. The statistical data are 
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presented in Table 2. It may be seen that six correlation coefficients 


are statistically significant. 


The two paired variables which showed a statistically significant 
correlation for both groups were enjoyment in ‘‘contacting employers for 


jobs” and the Kuder Persuasive scale. 


Table 2 


The evidence was not as clean- 


Relationship Between Kuder Preference Scores and Job Satisfaction as a 


Vocational Rehabilitation Counselor 








Satisfaction Scales 


Kuder 
Preference 
Scales 





Job as a Whole 

Job as a Whole 

Job as a Whole 

Job as a Whole 

Interviewing Clients 

Interviewing Clients 

Interviewing Clients 

Interviewing Clients 

Promoting the Program 

Promoting the Program 

Contacting Employers for Jobs 
Contacting Employers for Jobs 
Reading Scientific Literature on Rehab. 
Reading Scientific Literature on Rehab. 
Reading Scientific Literature on Rehab. 
Reading Scientific Literature on Rehab. 
Experimenting with Guidance Techniques 
Experimenting with Guidance Techniques 
Writing Case Histories 

Writing Case Histories 

Writing Case Histories 

Writing Case Histories 

Writing Case Histories 

Writing Case Histories 

Handling Clerical Details 

Handling Clerical Details 
Rehabilitation Work After Hours 
Rehabilitation Work After Hours 
Rehabilitation Work After Hours 
Rehabilitation Work After Hours 


Pers. 

Pers. 

Soc. Ser. 

Soc. Ser. 

Pers. 

Pers. 

Soc. Ser. 

Soc. Ser. 

Pers. 

Pers. 

Pers. 

Pers. 

Sci. 

Sci. 

Lit. 

Lit. 

Sci. 

Sci. 

Sci. 

Sci. 

Lit. 

Lit. 

Soc. Ser. 

Soc. Ser. 

Cler. 32 
Cler. 21 
Pers. 12 
Pers. .30 
Soc. Ser. .00 
Soc. Ser. 05 





*(N = 100.) Values of correlation coefficients required for statistical significance 
are .197 at the 5 per cent level of confidence aad .256 at the 1 per cent level of confidence 


(5, p. 212). 


** (N = 46.) Values required for statistical significance are .291 at the 5 per cent 
level of confidence and .376 at the 1 per cent level of confidence (5, p. 212). 
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cut for the other paired variables which showed a statistically significant 
correlation coefficient for one group but not the other. The variables, 
enjoyment in “interviewing clients” and the Kuder Social Service scale, 
showed a correlation coefficient for Group A which was very close to 
zero, although for Group B, the same variables showed a significant re- 


lationship beyond zero. The latter difference is difficult to explain 
satisfactorily. 


Work Efficiency and Measured Interests 


Another important phase of this study was to determine the relation- 
ships of the Kuder interest scores to the supervisory ratings on job 
efficiency. The results should indicate the possible value of the interest 
scores in predicting successful performance in the job, or in particular 
phases of the job. For example, did the high interest scores in the 
Persuasive, Literary, and Social Service scales have a relationship to 
proficiency on the job as a whole, and on such phases of the job as inter- 
viewing clients, interpreting psychological tests, using community re- 
sources, having high production and doing quality counseling? Similarly, 
did high Scientific interest scores correlate with effectiveness in experi- 
menting on and trying out new professional techniques, ete. 

At present, there are no satisfactory objective devices to evaluate 
job efficiency in vocational rehabilitation counseling. For this reason, 
the graphic rating-scale method was used, accompanied by instructions 
which sought to improve the reliability and validity of the ratings. 

The items rated were: a. counseling efficiency as a whole; b. conducting 
counseling interviews; c. interpreting psychological test results to the 
client; d. effective use of community resources; e. imparting occupational 
information to the client; f. writing up case reports; g. handling financial 
records for client’s rehabilitation expenses, making up the flow sheets, 
keeping field sheets up to date; h. making talks, speeches and promoting 
the program to the general public; i. making contacts with employers 
to secure job opportunities for his clients; j. reading current scientific 
articles, books, and reports on rehabilitation topics; k. experimenting on 
and trying out new techniques of counseling and guidance; |. continued 
work after regular hours; m. production record on rehabilitations; and 
n. quality of work. 

The frequency distributions of the efficiency ratings as converted 
into numerical scores from 0 to 20 were inspected for indications of 
normality. The distributions generally appeared to be very peaked at 
the center of the scale, usually with shorter peaks at the guide points 
designated as “passable” and “very good” on the graphic scales. The 
scores on each item spread over almost the entire range, and did not 
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appear markedly skewed. These indications made it necessary to con- 
sider the possibility that the relationships between job-efficiency and 
Kuder scores might be curvilinear. Accordingly, scatter diagrams were 
prepared, as well as empirical regression lines for the prediction of 
efficiency ratings from Preference scores. Inspection of the regression 
lines indicated no curvilinear relationships between the job-efficiency and 
the first Kuder scores. There seemed to be no reason to assume that the 


Table 3 


Relationship Between Kuder Preference Scores and Job Efficiency of Vocational 
Rehabilitation Counselors as Rated by Supervisors 








Job Efficiency vs. Kuder Scale on 2nd Kuder _ on Ist Kuder 





Whole Job vs. Mech.* —.12 — .04 
Whole Job vs. Comp. .08 01 
Whole Job vs. Sci. 02 — .03 
Whole Job vs. Pers. 01 10 
Whole Job vs. Art. .09 —.14 
Whole Job vs. Lit. .06 .03 
Whole Job vs. Mus. 07 02 
Whole Job vs. Soc. Ser. 04 

Whole Job vs. Cler. 13 .07 
Interviewing vs. Pers. 14 16 
Interviewing vs. Soc. Ser. .09 .00 
Interpreting Tests vs. Sci. 04 .06 
Use of Community Resources vs. Soc. Ser. 05 13 
Imparting Occupational Information vs. Mech. ‘ 14 
Imparting Occupational Information vs. Comp. .06 07 
Imparting Occupational Information vs. Sci. 14 .03 
Imparting Occupational Information vs. Pers. 05 17 
Imparting Occupational Information vs. Soc. Ser. 09 

Imparting Occupational Information vs. Cler. .00 00 
Writing Case Histories vs. Sci. .03 

Writing Case Histories vs. Lit. 07 10 
Writing Case Histories vs. Soc. Ser. 

Handling Records vs. Cler. : .08 
Publicly Promoting the Program vs. Pers. .24 .32 
Contacting Employers vs. Pers. ll 19 
Reading Scientific Literature vs. Sci. — .07 —.11 
Reading Scientific Literature vs. Lit. 26 .26 
Experimenting with Guidance Techniques vs. Sci. .04 — .02 
Rehabilitation After Hours vs. Soc. Ser. —.09 10 
Production Record vs. Pers. —.10 04 
Production Record vs. Soc. Ser. —.10 — .03 
Quality of Work vs. Pers. 01 A 
Quality of Work vs. Soc. Ser. —.01 05 





* Values required for statistical significance at 5% level of confidence = .197; at 
1% level of confidence = .256 (5, p. 212). 
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relationships would be of a different type between the job-efficiency and 
the 2nd Kuder scores. 

The product-moment r’s found between the Preference Record scores, 
both first and second tests, and the supervisory ratings of job efficiency 
are presented in Table 3. It will be seen that correlation coefficients 
were computed only between those pairs of variables which might be 
suspected of yielding statistically significant relationships. Of the 66 
coefficients, only five were statistically significant. 

These results show that higher Kuder scores on the Persuasive scale 
tend slightly but definitely to indicate better job performance in promot- 
ing the program to the general public, and that higher scores on the 
Literary scale tend slightly but definitely to indicate greater activity in 
keeping up with the scientific literature in the field of rehabilitation. The 
evidence is not as clean-cut for the statement that there is a real relation- 
ship between job efficiency in contacting employers for jobs for handi- 
capped clients and higher scores on the Persuasive scale of the Preference 
Record. The latter variables are found to be related at the five per cent 
level of confidence when the first Kuder test score is considered, but there 
is no statistical significance when the second Kuder score is involved. 


Supervisors’ Ratings on Job Efficiency Elements 


It is interesting to study the distributions of the supervisory ratings 
on the several scales. A comparison of the averages of the efficiency 
ratings on the different items may indicate that supervisors are more 
satisfied with counselors’ performance in some respects than with others. 
Perhaps the differences in average scores roughly indicate relative 
strengths and weaknesses in counselors’ performance in civilian rehabilita- 
tion counseling at least as regarded by the supervisors. The foreword 
which accompanied the efficiency rating forms instructed the supervisors 
to rate their counselors so that the middle of the scale would be approxi- 
mately the average for all counselors. Upon this background of in- 
structions, the ratings on the scales resulted in the scores presented in 
Table 4. 

According to the average ratings of the supervisors, they were rela- 
tively well satisfied with the ‘quality of work’ done by the rehabilitation 
counselors. This item ranked first in order of magnitude. The super- 
visors also seemed to be relatively pleased with the counselors’ efforts 
in the aspects of the job having to do with community contacts because 
the items next in order of magnitude were ‘“‘use of community resources,” 
and “contacting employers for jobs.’”” The “production record” was 
less satisfactory than the ‘quality of work.” The counselors were rated 
lowest in the items, “experimenting on and trying out new professional 
techniques” and “reading scientific publications on rehabilitation topics.” 
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They also were more unsatisfactory on “promoting the program to the 
general public’ and “interpreting psychological tests.’”’ Although the 
statistical data are not presented in this article, it has been found that 
the differences between means having rank orders (5) and above as 
shown in Table 4, as compared with means having rank orders (10) and 
below are statistically significant beyond the 1 per cent level of confidence. 
This signifies that in a similar sample the differences between the more 
extreme means will appear again if the experiment were to be tried over 
again under the same conditions. 


Table 4 


Supervisory Ratings on Counselors’ Job Performance in Civilian Rehabilitation Work 





Mean* 





Item Rated S.D. 





Quality of Work 12.6 
Using Community Resources 12.2 
Contacting Employers for Jobs 12.1 
Interviewing Clients 12.0 
Writing Case Histories 12.0 
Job as a Whole 12.0 
Handling Records 11.6 
Production Record 11.5 
Imparting Occupational Information 11.3 
Interpreting Tests 11.0 
Rehabilitation Work After Hours 10.5 
Promoting Program to Public 10.5 
Reading Scientific Publications on Rehabilitation 10.4 
Experimenting with Guidance Techniques 10.0 





* The ratings were converted into numerical scores from 0 to 20. 


An analysis of the magnitude of the standard deviations for all the 
items makes interesting material for a further observation. The highest 
standard deviations appeared for the items, “‘production record,” “‘con- 
tacting employers for jobs,” “‘promoting the program to the public,” and 
doing “rehabilitation work after hours.’’ The lowest standard deviations 
appeared for the items, ‘‘reading scientific publications on rehabilitation,” 
“quality of work,” “job as a whole,” and “interviewing clients.’”” Two 
reasons may be ascribed for these differences in the magnitude of the 
standard scatter of scores. One is that the group of the highest standard 
deviations relates to items more easily counted numerically, and that 
the second group above relates to more difficult qualitative judgments. 
In other words, supervisors spread out their ratings on counselors’ job 
efficiency more on “production record”’ rather than ‘‘quality of work’’ 
because the former is more objective. A second possible reason for the 
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higher and lower magnitudes of the standard deviations is that the 
counselors are more alike in the group of items including ‘‘quality of 
work” than in the group of items including “production record.”” The 
first reason is the preferred explanation for the differences in standard 
deviations. 


Summary 


Vocational Rehabilitation counselors were requested to take the 
Kuder Preference Record, to fill out a Survey Sheet which indicated their 
work satisfaction on the job, and also were rated for work efficiency by 
their supervisors. It was found that: 


1. The counselors derived a high degree of satisfaction from their job 
as a whole, and from such phases of it as interviewing clients, contacting 
employers for jobs, promoting the program to the public, experimenting 
with guidance techniques and reading scientific literature. They least 
enjoyed the handling of clerical details, overtime work, and the writing 
of case histories. 

2. Higher scores on particular Kuder scales had low but significant 
relationships to work satisfaction for only several aspects of the coun- 
selor’s job. However, the magnitude of the correlations was much too 
low for purposes of individual prediction. Higher scores on the Kuder 
Persuasive scale indicated greater enjoyment in contacting employers for 
jobs. Other evidences of significant relationship were not as consistent, 
appearing for one experimental group but not the other. 

3. Higher scores on particular Kuder scales had low but significant 
relationships to work efficiency for only several aspects of the counselor’s 
job. However, the correlation coefficients were too low for purposes of 
individual prediction. Higher scores on the Kuder Persuasive scale 
indicated greater work efficiency in promoting the program to the public; 
higher scores on the Kuder Literary scale indicated greater efficiency in 
keeping up with literature on rehabilitation. 

4. The efficiency of counselors was rated more alike in such job 
elements as quality of work, interviewing clients, keeping abreast of 
modern scientific literature, and in the job as a whole. The counselors 
were rated less alike in such aspects of job efficiency as production record, 
contacting employers for jobs, promoting the program to the public, and 
working after hours. These differences probably were due to greater 
difficulty in rating the counselors on items depending upon qualitative 
rather than quantitative judgments of job performance. 

Received December 20, 1948. 
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Certain Rorschach Response Categories and Mental Abilities 


J. R. Wittenborn 


Yale University 


It is common practice among Rorschach technicians to include, as a 
part of their personality appraisals, some remarks concerning the subjects’ 
“‘intelligence,’”’ “intellectual potential,’ ‘mental capacity,” or “‘intel- 
lectual efficiency.”’ Such evaluations are derived by the examiners 
from a variety of considerations. 

The Rorschach scoring categories most commonly used in estimating 
mental capacity or achievements are: a. the total number of responses 
(R); b. the number of whole responses, i.e., responses based on the whole 
card (W); c. the number of responses in which Human Movement is 
seen (M); and d. the form level of responses, i.e., the accuracy and 
detail with which forms are seen. 

These aspects of a Rorschach record are not employed independently 
in making appraisals. For example, the number of whole responses is 
dependent upon the accuracy with which forms are perceived. Since 
ability estimates offered by Rorschach workers make use of a wide variety 
of informal cues, it must be emphasized that it is not a purpose of the 
present investigation to determine the nature of the relationship between 
mental test scores and a mental ability estimate based upon a (otal 
Rorschach evaluation. 

The purpose of the present investigation is to examine the ability 
implications of certain objectively determined, quantitatively expressed 
classes of response which are unique to the Rorschach. Specifically the 
investigation is concerned with the location (i.e., the portion of the blot 
employed) and the determinant (i.e., the shading, color or projected 
movement employed in forming a response) factors; these are unique to 
the Rorschach. The content of perceptions, as well as their accuracy 
(form level), are general factors in projection and their significance is not 
unique to the Rorschach. Therefore, the content and form level of 
responses are not included in the present analysis. 

In meeting the purposes of the experiment, the analysis of data is 
conducted with respect to the following questions: 


1. What is the order of the relationships between certain Rorschach 
response scoring categories and test evidence of mental ability? Are 
they negligible relationships which permit the kind of gross distinctions 
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between ability levels that can be made from casual observations, or do 
they provide refined distinctions comparable to those provided by mental 
tests? If the relationships are of high order, it should be generally known 
so that they can be put to extensive use. Since Rorschach responses 
appear to be less a function of specific education experiences than the 
performances currently sampled by most mental tests, it is conceivable 
that the demonstration of high relationships could influence future 
mental test procedures. 

2. What is the nature of the relationship between various classes of 
mental ability and various Rorschach response categories? Intelligence, 
general ability, etc., are words for groups of human abilities, but the 
groups have no standard consistency. If a fuil appreciation of a relation- 
ship between a Rorschach response category and a mental ability is 
to be had, the nature of the mental ability in question must be specified. 
Accordingly, in the present analysis measures of verbal, spatial, and 
numerical abilities are employed. In addition, general measures of 
scholastic ability are included. If a pattern of relationship could be 
demonstrated between certain Rorschach response categories and certain 
classes of mental ability, an improved understanding of both Rorschach 
responses and of the mental abilities in question might result. 


The Experimental Plan 


The subjects were a heterogeneous group of 68 Yale students who 
had been in a speeded reading course or had consulted the writer. The 
Rorschach tests used in this analysis were administered and scored by 
Klopfer trained examiners. 

The ability data employed in the analysis, the results of the College 
Entrance Examinations and the results of the Yale Freshman Aptitude 
tests, were taken from the files of the Yale University Student Appoint- 
ment Bureau. Scores for the following variables were taken from each 
student’s file and used in the analysis: 


I. Scholastic Ability: 1. First Semester Freshman Year grade average; 
and 2. General Scholastic Prediction for Freshman Year. 

II. Verbal Ability: 1. College Entrance Scholastic Aptitude Verbal 
test; 2. College Entrance English Essay test; and 3. Yale Verbal Reason- 
ing test. 

III. Numerical Ability: 1. College Entrance Scholastic Aptitude 
Mathematical test; and 2. Yale Quantitative Reasoning test. 

IV. Spatial Ability: 1. Yale Spatial Visualization test; and 2. Yale 
Mechanical Ingenuity test. 


Using only Yale undergraduates as subjects restricts the range of 
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ability sampled.! Probably no member of the present group has a 
verbal IQ as low as 115. The range of ability sampled is less restricted 
than at first might be supposed, however; the high levels are very well 
represented. Moreover, some of the tests which are not relevant to 
general academic achievement, e.g., the measures of spatial ability, may 
include a very wide range of scores. In general it may be claimed that 
using a variety of tests which sample relatively homogeneous, specifiable 
abilities resu'ts in less range restriction than would result in using one 
general ability score, e.g., an IQ. 

There are two sets of considerations to be observed:in generalizing from 
the results of the present study: a. If no significant relationships are 
found in the present sample, it is unlikely that important linear relation- 
ships would be found in a more heterogeneous sample; and b. If the 
relationships in the present sample are highly significant, it is possible 
that they would have a practical predictive value in a more heterogeneous 
sample. 

An answer to the two questions raised in the introduction calls for an 
examination of the relationships between each of the nine mental ability 
measures and each of eighteen Rorschach categories.? 

Since there were nine mental tests to be correlated with eighteen 
Rorschach categories (a total of 162 determinations), it was decided first 
to make the simplest preliminary examination of each possible relation- 


ship, and subsequently to make a thorough study of the promising rela- 
tionships. For this purpose the 10 highest and 10 lowest people in each 
mental test distribution were selected. This provided nine different 
sets of high and low standing students. Scores on each Rorschach 
scoring category were obtained for the high and low standing groups 
for each test. 


Analysis of Data 


Table 1 shows the average number of Rorschach responses for the ten 
students who scored highest and for the ten students who scored lowest 
on each of the tests. The two measures of scholastic ability are not 
tests; one is merely a grade average and the other is a prediction based 


1 This restriction does not preclude the possibility that these tests can show high 
correlation in a sample of Yale undergraduates. Some of the above tests have inter- 
correlations as high as .70. 

2 These are: 1. W, Whole Blot; 2. D, Large Usual Detail; 3. d, Small Usual Detail; 
4. Dd, Unusual Detail; 5. S, White Space; 6. M, Human Movement; 7. FM, Animals 
in Action; 8. m, Abstract or Inanimate Movement; 9. k, Shading on a Three Dimen- 
sional Expanse projected on a two dimensional plane; 10. K, Shading or Diffusion; 
11. FK, Shading in Three Dimensional Expanse in Vista or perspectus; 12. F, Form 
only; 13. Fe, Shading or Surface Texture; 14. c, Shading and Texture; 15. C’, Achro- 
matic Surface Color; 16. FC, Definite Form with Color; 17. CF, Color with Indefinite 
Form; and 18. C, Color only. 
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on a weighted combination of secondary school grades and mental test 
scores. With the exception of first semester average, ali of the mental 
measurements show evidence of a positive relationship with the number 
of Rorschach responses. None of these differences was statistically 
significant when a ¢ test was made. 

The total number of Rorschach responses was found, upon inspection, 
to be positively skewed for all groups, thus showing a ¢ test of the differ- 
ences between the means of the total number of responses to be inap- 
propriate. As a consequence the logarithm of each subject’s total 
number of responses was found. The distributions of the logarithms 
were roughly symmetrical in form, and the ¢ test was repeated based on 
the logarithms of the total number of responses. 


Table 1 


The Average Number of Rorschach Responses for the 10 Students Scoring Highest and 
the 10 Scoring Lowest on Each of the Nine Mental Ability Measurements 








Average Number of Rorschach Responses 





10 Highest 10 Lowest 
on Test on Test Difference 





Scholastic 
1. First Semester Average 32.2 35.0 —2.8 
2. General Scholastic Prediction 51.1 35.2 15.9 


Verbal 

1. Scholastic Aptitude Verbal 38.4 33.8 4.6 
2. English Essay 49.4 40.6 8.8 
3. Verbal Reasoning 46.4 38.4 8.0 


Numerical 
1. Scholastic Aptitude Mathematical 48.9 38.1 10.8* 
2. Quantitative Reasoning 42.6 38.5 4.1 


Spatial Ability 


1. Spatial Visualization 39.2 36.6 2.6 
2. Mechanical Ingenuity 41.1 31.1 





* Difference between logarithms significant at the 5% level. 


Only one test, Mathematical Aptitude, showed a difference significant 
at the five per cent level. The fact that all of the differences (with one 
exception) are positive, indicates a probably positive relationship between 
some of the mental tests and the total number of Rorschach responses. 
The trends (with the possible exception of the one significant difference), 
are too slight to afford any qualitative evaluation of the pattern of ability 
and Rorschach response relationships. Considering the great difference 
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between the low and high level ability groups, the findings offer liftle 
support for the practice of using the total number of Rorschach responses 
as an evidence for mental ability among individual college students. 

Because of the indication of a slight positive relationship between 
total number of responses and measures of mental ability, the location 
and determinant scores for each individual were expressed as a per cent 
of his total number of responses.* Both the raw scores and the per cent 
of total scores were analyzed. Despite the large number of differences 
examined, very few trends were discovered and almost none of them 
was significant. Only the promising trends will be presented in the 
following paragraphs. 

Table 2 


A Comparison Between Pairs of High and Low Scoring Groups on the Basis of Both 
the Number and the Per Cent of Human Movement Responses 





High Group Low Group 


Difference 


No. M No. M o% M 





Test No. M 





. Scholastic 
1. First Semester Average 5.3 3.6 hoe 
2. General Scholastic Prediction 7.9 4.6 3.3 


Verbal 

1. Scholastic Aptitude Verbal 8.8 4.2 4.6 
2. English Essay 8.2 8.1 A 
3. Verbal Reasoning 8.1 3.5 


Numerical 
1. Scholastic Aptitude Mathematical 8.3 : 4.0 
2. Quantitative Reasoning 8.2 F 4.7 


Spatial Ability 
1. Spatial Visualization 6.5 4. 2.2 4.1 
2. Mechanical Ingenuity 7.9 5. 2.5 5.0 








Table 2 indicates the nature of the relationship between the number 
of Human Movement (M) responses and the mental ability measures. 
It is apparent that there is a general tendency for the number of Human 
Movement responses to be positively related with mental ability measure- 
ments. This tendency is not wholly due to the fact that mental ability 
is slightly related to total number of responses; this is indicated by the 
consistent positive differences between the groups in per cent Human 

In his study of the relationships between Belleview-Wechsler scores and Beck’s 
Rorschach scoring factors, Wishner (8) makes no adjustment for the manner in which 
some of the scoring factors may be influenced by the total number of responses (R). 
This is regrettable because his data suggest that the validity he claims for Z could 
be largely if not entirely due to R. 
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Table 3 


A Comparison Between Pairs of High and Low Scoring Groups on the Basis 
of the Per Cent of Whole Responses 





High Group Low Group Difference 





%WwW %W % WwW 
i. Scholastic 
1. First Semester Average 


2. General Scholastic Prediction 





Verbal 

1. Scholastic Aptitude Verbal 
2. English Essay 

3. Verbal Reasoning 


Numerical 
1. Scholastic Aptitude Mathematical 
2. Quantitative Reasoning 


Spatial Ability 
1. Spatial Visualization 
2. Mechanical Ingenuity 





Table 4 


Evidence for Relationship Between Tendency to Give Achromatic Color Responses and 
Tendency to be in the High or Low Scoring Groups for Each of the Mental Tests 








x?* P 





Scholastic 

1. First Semester Average .30 
2. General Scholastic Prediction .02 
Verbal 

1. Scholastic Aptitude Verbal A 01 
2. English Essay é .30 
3. Verbal Reasoning .02 


Numerical 
1. Scholastic Aptitude Mathematical .02 
2. Quantitative Reasoning .22 .70 


Spatial Ability 
1. Spatial Visualization 833 30 
2. Mechanical Ingenuity .20 





* Without Yates correction. 
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Movement responses. The number of Human Movement responses like 
the total number of responses proved to be positively skewed; as a conse- 
quence a ¢ test was based on the logarithms of the per cent of Human 
Movement scores. None of the differences proved to be significant at 
the five per cent level. 

The number of whole responses showed little promise of being related 
with the mental ability scores. Table 3 shows the ambiguous finding 
for per cent whole responses. 

Only one of the other scoring categories for determinant or location 
factors showed evidence of being related with mental ability. This was 
the number of achromatic color responses (C’). 

Since for any individual the number of achromatic color responses 
(C’) was small, no ¢ test was feasible and no correction for the influence 
of the total number of responses on the number of achromatic color 
responses was made. The reliability of the relationship between the 
number of achromatic color responses and mental ability scores was 
examined by means of a x? test of independence, table 4. 


Discussion 


The experimental findings are discussed with respect to the two 
questions to which the experiment is specifically relevant: 1. What is 
the order of any linear relationship between a Rorschach response 
category and test evidence for mental ability? 2. What is the pattern 
of linear relationships between various classes of mental ability and 
various Rorschach response categories? 

With respect to the first question, it is apparent that no linear rela- 
tionship of sufficient strength to justify zndividual evaluation exists be- 
tween any type of mental ability sampled and any one of the usual 
Rorschach location or determinant scoring categories. The qualification 
“linear” is offered because it is possible that at a low level of ability a 
more appreciable relationship exists between mental ability and fre- 
quency of responses in certain of the Rorschach categories. Such dis- 
continuous or curvilinear relationships have not proved to be important 
in mental ability studies, however. 

Because of the paucity of evidence for reliable relationships between 
mental ability and the selected Rorschach response categories, the second 
question becomes irrelevant. The slight trends observed give no hint 
that certain types of responses are correlated with certain types of ability. 

Obviously the present findings do not preclude the possibility that 
the Rorschach may be used in some manner or other to predict some 
aspect of mental ability. The present study does indicate the limited 
value of Rorschach location and determinant categories as evidence for 
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mental ability. This suggests that the accuracy of Rorschach perceptions 
(form level ratings (4)) and other cues are the primary basis for any 
valid appraisal of mental ability. Such cues are not particularly ob- 
jective; their evaluation is informal and not well standardized. The 
reliability of form level ratings accrues from the consistency of the 
examiner,‘ and their validity is dependent upon his judgment. Thus 
it appears that the most formal and objective aspects of a Rorschach 
protocol (the location and determinant category scores) have almost 
no validity. The remaining factors (form level and the other purely 
qualitative cues) are likely to be unreliable or, at best, to possess a 
reliability which is more a characteristic of the examiner than of the 
Rorschach procedure. Concerning the possible validity of accuracy of 
perceptions as an evidence for mental ability, it is of interest that Beck’s 
(1) F plus % (probably more reliable than Klopfer’s form level ratings) 
was found by Hertz (2) to be correlated with mental ability; Wishner 
(8) could not confirm this, however. 


Summary and Conclusions 


The present study is an examination of the relationships between 
measures of scholastic, verbal, numerical, and spatial abilities and the 
commonly used Rorschach scoring categories for location and deter- 
minant factors. The subjects were a sample of sixty-eight Yale students. 


The findings may be summarized in the following manner: 


1. Although the tctal number of responses, the number of whole 
responses, or the number of Human Movement responses is often used 
as a part of the procedure for estimating mental ability from Rorschach 
protocols, in the present sample none of them has sufficient validity to 
to justify use for distinguishing between individual college students of 
different levels of ability. 

2. If the relationships between any Rorschach location or deter- 
minant category and any of the types of mental ability used in the 
present study is linear, the evidence from this sample indicates that their 
value for predicting individual mental ability is so scant as to make their 
use at any ability level uneconomical and misleading. 

3. The present negative or negligible findings do not preclude the 
possibility that some examiners, employing other aspects of the protocol 
or clues not gained from the Rorschach responses, may arrive at valid 
estimates of some sort of mental ability. 

4. There is evidence of a slight tendency for the total number of 
Rorschach responses (R) to be positively correlated with several measures 
of mental ability. This finding requires that all of the other comparisons 

‘ This was recognized by Wishner (8). 
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had to be corrected for differences in total number of responses in order 
to eliminate the spurious effect of a third variable. 

5. Two of the Rorschach scoring categories based on the determinants 
of a response (the color, shading, or movement factors) show evidence for 
a slight positive relationship with measures of mental ability. They are 
the number of Human Movement responses and the number of achro- 
matic color responses. 

6. None of the Rorschach categories based on the location factor 
(portion of the card used in forming a response), is related with any of 
the measures for mental ability. Significant trends were absent not only 
among the skewed raw scores but among their logarithms as well. 
Received November 18, 1948. 
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Modification of Academic Performance 
through Personal Interview * 


Alex C. Sherriffs 


University of California 


Among the many problems facing university teachers today is that 
of the large class. Each year finds a greater proportion of university 
courses with enrollments in the hundreds. Some of the larger uni- 
versities report over a thousand students in the beginning courses of 
certain popular fields. 

Most instructors feel that the large class is an educational hazard. 
The negative aspects perhaps most frequently cited include the minimal 
opportunity for student participation during lecture hours, the necessity 
for using recognition type examinations which usually do not call for 
the integration of course material, serving only the purpose of providing 
a basis for grading students, and the essential lack of contact between 
individual students and the course instructor. 

It is with one phase of this latter aspect that this paper is concerned. 
The experiment reported here is intended to throw some light on the 
significance of the contact of individual students with their instructor, 
especially in the situation of the large class. 

This experiment was formulated on the basis of three hypotheses. 
These were: (1) that those students of a large class who felt themselves 
to be known as individuals to their instructor would demonstrate more 
effective learning of course material than would their fellows not so 
known; (2) that there would be demonstrable individual differences in 
the effects of being known to the instructor; and, (3) that such individual 
differences could be predicted with some accuracy. 


Procedure 


To test the first hypothesis, it was decided to subject a random sample 
of students in a large class to a sixty-minute interview by their instructor 
during the week following the first midterm examination. Scores on 
this examination would serve as a baseline against which to compare the 
performance of these students on examinations following the interview. 
The remainder of the class would serve as the control group. 


*The writer is indebted to Edna Adelson and to Joseph Adelson for technical 
assistance in this study. 
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To test the second and third hypotheses, judgments would be ob- 
tained on the students interviewed as to certain personality variables. 
These variables would be characteristics considered likely to modify the 
effect of an interview contact by the instructor. Students high on such 
characteristics would be compared in their performance on examinations 
taken after the time of the interviews with those low on these character- 
istics, and both of these groups would be compared with the non-inter- 
viewed students of the class. 


The Subjects 


The class chosen for this experiment was the beginning survey course 
in psychology at the University of California. This class was chosen 
simply because of its availability and its large enrollment. The experi- 
menter was the instructor, and some 257 students were registered and 
took the examinations throughout the course. 

This course is open only to those not intending to major in psychology. 
The students were all freshmen and sophomores who ranged in age from 
17 to 24, with a mean age of 19.0 for the group. 

The course extended over a sixteen-week period, with three lectures 
each week, and with objective recognition type examinations. Midterms 
were administered during the fifth and tenth weeks, and the final examina- 
tion was held at the end of the sixteenth week. All students in the 
course were required to serve as subjects for two hours of laboratory 
experiment during the semester. The interview for the subjects of the 
present investigation counted as one of the regular laboratory hours. 

A sample of thirty-four students was selected for the experimental 
group by including every eighth student on the class roll. Check 
indicated the sample to be representative of the class as a whole on the 
variables of age, sex, and academic major. Of importance was the fact 
that the distribution of scores made on the first midterm by this group 
was highly similar to that made by the rest of the class (See Table 1). 


Table 1 


Comparison of the Experimental Group with the Remainder of the Class 
on the First Midterm Examination 








Mean S.D. 





Experimental Group 49.5 4.82 
(N = 34) 


Remainder of Class 
(N = 223) 
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The Interview and the Personality Variables. Since the main function 
of the interview was to cause each student of the experimental group to 
feel that he was known to the instructor as a definite individual, and 
since it was also desired to gain information concerning each subject so 
as to make certain personality judgments, the interviews were directed 
at procuring life history and attitudinal material. The instructor care- 
fully avoided discussion of material of the course or of the student’s 
reaction to the class. The explanation given the student for the inter- 
view was that it was desired to know as much as possible of the interests 
and backgrounds of those enrolling in this course. 

The personality variables chosen from among those likely to have 
significance in relation to the effect of the interview on the student’s 
academic performance follow: 


1. Self-tension. The amount of tension felt by the student as to his 
own adequacy and worth. 
2. Family-tension. The amount of tension felt by the student in his 
family relations, in regard both to parents and siblings. 
. Social-tension. The amount of tension felt by the student in his 
social relations. 

Over-all tension. The general level of tension and anxiety under 
which the student functions, taking into account the above 
three areas. 

Achievement need. The importance to the student of high aca- 
demic grades. 

6. Affection need. The importance to the student of receiving a 
constant supply of warmth and affection from others. 

7. Praise need. The importance to the student of praise and recog- 
nition from others. 


Obviously these variables would not be completely independent, but 
it seemed that their individual meaning was sufficiently separate to be 
useful for this study. Intercorrelation of measures of variables was no 
handicap so long as a true relationship was represented. The real 
difficulty lay in not having independent observers to obtain the different 
measures. By the nature of the study only the course instructor could 
interview the group of subjects. The amount of intercorrelation re- 
sulting from “halo effect’? operating on the one interviewer cannot be 
determined. 

Five-point rating scales with defined points were utilized for the 
judgments of the four tension variables. These were rating scales 
previously found to be useful by the writer.'. Seven-point rating scales, 

1 Sherriffs, A.C. The “Intuition Questionnaire”: A new projective test. J. abnorm. 
soc. Psychol., 1948, 43, 326-337. 
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with the points defined in terms of probability of occurrence, were em- 
ployed for the ratings of the subjects on the three ‘‘need’’ variables. 
These rating scales follow closely the method outlined by Murray.? 


Results * 


1. First Midterm Examination. The first task was to assess differ- 
ences on the first midterm examination between the randomly selected 
sample and the remainder of the cless. This examination, it will be 
remembered, was administered before the members of the experimental 
group were interviewed. The pertinent data for this comparison appear 
in Table 1. 

This comparison does not suggest that the experimental group was 
different from the remainder of the class in terms of performance on the 
first midterm examination. 


Table 2 


Comparison of the Experimental Group with the Remainder of the Class in Terms of 
the Shifts in Scores from the first Midterm to Subsequent Examinations 








Midterm I to Midterm II 








Mean 8.D. t Mean S.D. t 





Experimental Group +6.6 5.52 ; +80.4 9.99 
(N = 34) 


Remainder of Class +4.3 5.31 +76.3 12.08 
(N = 223) 





* Significant at the 2 per cent confidence level. 


2. Performance on Second Midterm Examination and on Final Ex- 
amination as Compared with Performance before the Interview. Suggestions 
as to the effect of the interview on the performance of the experimental 
group of subjects come from comparisons of this group with the re- 
mainder of the class in their functioning on later examinations relative 
to their functioning on the first, pre-interview, midterm. The mean 

2 Murray, H. A. Explorations in personality. New York: Oxford University Press, 
1938. 


3 All estimates appearing in this paper of the significance of the differences between 
means are based on the t test. Comparisons involving a two part split of the thirty-four 
subject experimental group require a t of 2.04 to be significant at the 5 per cent con- 
fidence level, and a t of 2.74 to be significant at the 1 per cent confidence level. Those 
comparisons which involve the 223 subjects not included in the experimental group 
require a t of 1.96 to be significant at the 5 per cent level, and at of 2.58 to be significant 
at the 1 per cent level. 
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differences in points scored on the first midterm examination and those 
scored on the second midterm and those scored on the final examination 
are presented in Table 2. The variabilities of the shifts in performance, 
and the significance of the differences between the shifts of the experi- 
mental group and of the remainder of the class are also shown. 

These comparisons suggest that the interviewed group of students 
improved more than did the remainder of the class in their performance 
on the second midterm examination held four weeks after their contact 
with the instructor. The difference in improvement is significant at 
the 2 per cent confidence level. The difference still favors the experi- 
mental group at the time of the final examination some ten weeks later, 
but this latter difference is not significant at the 5 per cent level. 


Relationship of Rated Personality Variables 
to Effects of Interview on Performance 


Distributions of ratings were made for each of the seven personality 
variables to be studied. These distributions were then considered sepa- 
rately and split in each case so as most closely to accomplish a 50-50 
division of the subjects on that particular variable. Comparisons could 
then be made of the examination performance of those subjects rated 
higher on each variable with the examination performance of those rated 
lower. 

1. Relationship of Rated Personality Variables to Performance on the 
First Midterm Examination. The means and standard deviations of the 
scores made on the first midterm examination by those rated higher and 
those lower for each personality variable are shown in Table 3.  Esti- 
mates as to the significance of the differences between these means are 
also indicated. 

Table 3 


Performance on the first Midterm Examination of those of the Experimental Group 
Rated Higher and of those Rated Lower on Seven Personality Variables 





Higher Ratings Lower Ratings 





Variable i Mean i S§..D. N Mean __ §.D. 





Self Tension 48.1 5.82 i 18 50.8 3.24 
Family Tension 48.3 5.54 2 20 50.4 4.03 
Social Tension 48.1 5.06 2 17 50.9 4.09 
Overall Tension 47.5 4.77 : 21 50.8 4.39 
Achievement Need 50.0 4.77 , 21 49.2 4.83 
Affection Need 45.3 4.67 : 24 51.3 4.38 
Praise Need 2 46.8 ‘ 65 22 51.0 4.05 


* Significant at the 5 per cent confidence level. 
** Significant at the 1 per cent confidence level. 
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These comparisons reveal that the students rated as most tense, 
generally, and in each of the three areas of tension, did less well on the 
first midterm than did their fellow students who were rated as less tense. 
Those students judged most strongly to need affection and praise did less 
well than did those judged to have less of these needs. In the case of 
the achievement need we find that those students with higher ratings 
performed better on the examination. The differences between means 
are significant at the 5 per cent level of confidence in the cases of overall 
tension and need for praise, and at the 1 per cent level in the case of need 
for affection. 


Table 4 


Relationship to Rated Personality Variables of Shifts in Scores 
from First Midterm to Subsequent Examinations* 








Midterm I to Midterm II** Midterm I to Final** 








Higher Lower Higher Lower 
Ratings Ratings Ratings Ratings 








Variable Mean S.D. Mean S.D. Mean S.D. Mean _ S5.D. 





Self Tension 8.8 4.78 4.7 5.45 83.5 9.12 77.7 9.92 
Family Tension 8.9 4.79 5.0 5.43 80.9 10.30 80.1 9.72 
Social Tension 8.3 4.93 49 5.59 82.0 9.32 78.8 10.36 
Overall Tension 8.7 4.37 §.3 5.77 81.1 10.01 80.0 9.94 
Achievement Need 6.6 3.81 6.7 6.35 83.8 10.56 78.3 8.97 
Affection Need 10.8 4.12 49 5.08 83.2 8.40 79.3 10.35 
Praise Need 8.8 4.76 5.5 5.56 84.3 8.76 78.3 9.99 





* The N’s for the subgroups of subjects represented in this table may be found in 
Table 3. 


** All shifts are positive for in all cases means are higher on the second midterm and 
on the final examination than on the first midterm. 


The implications of these findings would seem to be that degree of 
tension and amount of need for affection and for praise are related to 
examination performance by students in large classes. One might hazard 
guesses as to the further meaning of these data, for example, the relation 
of these tensions and these needs to academic performance generally, 
regardless of class size, and the deeper meaning of the presence of high and 
low tension and strong and weak needs in regard to personality structure 
and function. 

2. Relationship of Rated Personality Variables to Performance Occurring 
after Contact between Student and Instructor. It was necessary to find a 
measure of the effect on performance of contact with the instructor, at 
the same time relating this effect to the seven personality variables being 
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investigated. The sole use of scores on the second midterm examination 
and on the final examination would be inadequate because of the findings 
presented in the previous section. Such scores would be ambiguous in 
meaning for our purposes because of the fact that the personality varia- 
bles were shown to be related directly to performance. This relation- 
ship would somehow have to be taken into account before the effect of 
the instructor contact could be isolated. 

The measure best serving the purposes of this study was that of the 
differences in performance before and after contact with the instructor 
as related to ratings on the personality variables. Data on such shifts 
in mean performance from the first midterm examination to the second 
midterm examination and to the final examination will therefore be 
presented. 


Table 5 


Significance of the Difference in Scores on Examinations Taken Before and After 
Interview Contact with Course Instructor 








Midterm I to Midterm II Midterm I to Final 








High on Highon Lowon Highon Highon Lowon 

Variable Variable Variable Variable Variable Variable 

vs. Low vs. Class vs. Class vs. Low vs. Class vs. Class 
Variable t t t t t t 











Self Tension +2.21* +3.27** + .35 +1.72 +2.33* + A7 
Family Tension +2.11* +3.20** + .59 + .24 +1.39 +1.34 
Social Tension +1.80 +3.01** + .50 + 91 +1.90 + .83 
Overall Tension +1.75 +2.93** + .87 + .30 +1.39 +1.35 
Achievement Need — .02 +1.57 +1.94 +1.59 +2.19* + .73 
Affection Need +3.16** +3.82** + .54 +1.04 +1.78 +1.15 
Praise Need +1.68 +2.85** +1.00 +1.67 +2.24* + .75 





* Significant at the 5 per cent confidence level. 
** Significant at the 1 per cent confidence level. 


In Table 4 the means and standard deviations of the differences in 
scores on examinations taken before and after interview contact with the 
instructor are presented. The experimental group is broken down into 
those students rated higher and those rated lower on each personality 
variable. 

The significance of the differences in shifts in examination scores 
between: (1) those students high on each variable and those low; (2) 
those high on each variable and the remainder of the class; and (3) those 
low on each variable and the remainder of the class, were then calculated. 
Table 5 summarizes the resulting ?¢’s. 
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The data summarized in Tables 4 and 5 suggest that: 


1. The effect of a single interview contact by the individual students 
of the experimental group with their course instructor was not uniform. 
There was significantly (at the 1 per cent level) more effect on those 
students rated higher on self tension, family tension, social tension, over- 
all tension, affection need, and praise need than on those rated lower— 
when one compares the performance of these students with that of the 
non-interviewed students of the class. 

2. The effect of the interview contact diminishes over the ten-week 
period before the final examination, holding up (at the 5 per cent level) 
for only three out of the seven variables, and only then in the case of 
those subjects judged relatively high on these variables. Nonetheless, 
comparison of the scores of the subgroups of the interviewed subjects 
shows them all to have higher mean scores than the mean score attained 
by the remainder of the class as a whole. 


The limitations of this study in terms of numbers of subjects in the 
experimental subgroups, lack of controls over the possibility of the opera- 
tion of ‘“‘halo effect” on the personality ratings, and the fact that the data 
obtained are from one class at one university and in relationship to one 
instructor do not allow for definite generalizations to students, classes, 
and instructors the world over. However, the writer feels the results of 
this study to be evidence for the value of personal interviews with students 
in large classes. These results further suggest that some students are 
handicapped in their performance by the lack of student-teacher contact 
and the lack of individualization felt when an “unknown” member of a 
class. This study points to the possibility of discovering those students 
who need most and who would profit most from individual attention. 
Conversely, it indicates the possibility of screening those students who 
would be handicapped but little by membership in a large class insofar 
as lack of contact with the instructor is concerned. It is of particular 
interest to the writer that the significant improvement in examination 
performance made by students following the interview with their in- 
structor was made after a single contact, and a contact of only one hour. 
The results of a study on the effects of continued conferences might 
truly be exciting. 


Received November 4, 1948. 





Vocabulary Item Difficulty and Word Frequency 
James J. Kirkpatrick and Edward E. Cureton 


University of Tennessee 


In constructing a vocabulary test, it is desirable in many cases to 
arrange the items of the experimental edition in approximate order of dif- 
ficulty. Test constructors often try to do this by arranging them in the 
order of frequency of occurrence of the key words. Questions immedi- 
ately arise concerning the validity of this procedure, and the possibility of 
improving it by the use of direct judgments. A study designed to throw 
some light on these matters was made, using the 100 four-choice vocabu- 
lary items of the Army General Classification Test, Formsla and lb. The 
difficulties of these items, as reported by the Staff, Personnel Research 
Section, The Adjutant General’s Office (4), are given in terms of the 
percentages of correct responses made by the soldiers in the experimental 
tryout samples. The Form la sample included 400 cases; the Form lb 
sample, 218. The difficulty values of the Form lb items were adjusted to 
make them comparable to those of Form la. The frequency of the key 
word (the stem-word of the item or the correct answer, whichever was 
least frequent!) was taken as the frequency value. The frequencies were 
taken from The Teachers Word Book of 30,000 Words (6). This word 
book reports the frequencies of words in terms of the number of occur- 
rences per million running words, for the 19,440 words which are en- 
countered at least once per million; and in number of occurrences per 
eighteen million running words, for those which are encountered less 
frequently than once per million, but more frequently than once per four 
million. The 952 words which appear 50 to 99 times per million are 
lumped together and simply labeled A; the 1,069 words which appear 
100 or more times per million are all labeled AA. 

The number of different words at each frequency-of-occurrence level 
forms a J-shaped distribution, which can be fairly well represented by an 
exponential function. In order to obtain a more or less symmetrical dis- 
tribution of frequency measures, the common logarithms of the fre- 
quencies of occurrence were grouped into equal intervals. The interval- 
width was determined by the fact that all words in the A-group in the 
word book (50-99 per million) had to go into one group. For the less 


1 Tn six of the 50 items of Form 1a, and in one of the 50 items of Form 1b, the correct 


answer was of lower frequency of occurrence than the stem-word, according to the word 
book. 
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frequent words, the numbers of occurrences per eighteen million were 
divided by eighteen, and the quotients were rounded off to one decimal. 
This procedure gave eleven groups containing the following frequencies: 


Range of Frequencies Number of Number of Words 

Group per Million Words in Group in ACCT laand 1b 
100+ (AA) 1069 2 
50-99 (A) 952 5 
26-49 1256 3 
13-25 1865 18 
7-12 2506 10 
4-6 2638 21 
2-3 3945 15 
8-1 ate 13 
A-.7 ” 10 
.2-.3 _ 2 
not in list — 1 


** Not reported in word book. 


KOC WON OO rR WH 


— 


We were also interested in the possibility of improving on these 
frequency-estimates of difficulty by the use of direct judgment (not, in 
this case, in testing the validity of direct judgment per se). Each of 
the 100 items was therefore typed on a 3” by 5” card, and the frequency 
group recorded on the face of the card. Each judge was presented with 
the eleven groups of cards, informed concerning the basis of the grouping, 
and requested to rearrange the cards among the eleven groups so that 
they would be more nearly in the order of their “true difficulty”’ (defined 
as the probability that the average American soldier in World War II 
would get the right answer). They were required to keep to exactly 
eleven groups, but were not required to keep the same number of cards in 
each group as the number given by the frequency-count. Judgments 
were secured from five English instructors, each of whom worked in- 
dependently.2, The sum of the five group-allocations was computed 
for each item. These sums ranged from the minimum possible, 5, to the 
maximum possible, 55, the larger numbers representing greater judged 
difficulties. 

A third estimate of difficulty consisted simply of a count of the 
number of syllables in the key word of each item (see Flesch, 2). These 
numbers ranged from one to five. 

The validities of the three methods of estimating item difficulties 
were determined by correlating these estimates with the criterion given 

2 We are indebted to the following members of the English Department of the Uni- 


versity of Tennessee for making these difficulty judgments: John A. Hansen, Robert L. 
Hickey, Alice E. Johnson, Clarence P. Lee, and Elizabeth G. Morris. 





Vocabulary Item Difficulty and Word Frequency 
in the Army study. The correlations are as follows: 


Criterion with frequency AT 
Criterion with judgment 71 
Criterion with syllable-count* .20 
Frequency with judgment 81 


* Sheppard’s correction applied to standard deviation of syllable-count. 


The last correlation reported above is not a validity coefficient, and 
it is spuriously high because the judges knew the frequency groups to 
begin with. It was computed because it was needed in testing the signifi- 
cance of the difference between the first two criterion correlations. 

Inspection of these correlations immediately suggests the marked 
superiority of the frequency-plus-judgment technique, and the equally 
marked inferiority of the syllable-count technique. It seems reasonable 
to suggest, on the basis of this latter finding, that the authors of “‘read- 
biblity” formulas investigate the relative merits of counting syllables 
as against having a single judge estimate word-difficulties. Since five 
judges participated in this study, and since they started with knowledge 
of the frequency-counts, the outcome of such studies cannot, of course, 
be predicted. 

The significance of the difference between the correlations of frequency 
and judgment with the criterion was evaluated by Hotelling’s adaptation 
of Student’s t-test (3). The value of ¢ was 5.5, which is clearly significant 
at the .001 level. 

Applying the Fisher z-transformation to the correlation of .71 between 
difficulty and frequency-plus-judgment, it was found that the chances 
are 19 to 1 that its ‘‘true’”’ value lies between .60 and .80. 

A second study, concerned with difficulty and frequency only, was 
based on a set of items consisting of word-pairs to be marked §, O, or N, 
depending on whether their meanings were the same, the opposite, or 
neither. Three hundred such items were administered to about 500 high 
school seniors. On the basis of total scores on the 300 items, the top 
100 and the bottom 100 examinees were selected as criterion groups, and 
68 items were discarded on the basis of failure to discriminate between 
these two groups. The difficulty of each of the remaining 232 items was 
taken as the per cent correct in the combined group of 200. The fre- 
quency value was taken as the ordinal thousand, in The Teachers Word 
Book of 20,000 Words (5), of the least frequently occurring word in the 
pair. For this set of 232 items, the correlation between difficulty and fre- 
quency was .56. 

Davis (1) has reported low correlations between item difficulty and 
stem-word frequency, as given by The Teachers Word Book of 20,000 
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Words (5), for three types of vocabulary items from the Cooperative 
English Test. Two of these item types require the examinee to supply 
a word which matches a given definition. The factor-analysis literature 
(7, e.g.) suggests that such items measure ‘‘verbal fluency” to some con- 
siderable degree, whereas items of the types reported in our own studies 
measure mainly ‘verbal relations.” The third item type studied by 
Davis was apparently more like those of the Army General Classification 
Test: a stem word followed by five alternatives from among which the 
examinee was required to select the synonym of the stem word. Using 
208 items of this type, Davis found a correlation between difficulty and 
frequency of only .10. 

It is quite possible that the superficial similarity between the Cooper- 
ative Vocabulary Test items and those of the Army General Classification 
Test is considerably greater than their actual content similarity. The 
Cooperative items were designed to measure precision of knowledge of 
fairly common words. Davis criticizes the practice of including rare 
words to provide difficult items in vocabulary tests. He says (1, pp. 
71-2), “The difficulty of a multiple-choice vocabulary item for a given 
group of subjects is dependent on two main factors: first, the per cent 
of the group that could define the word correctly if asked to state its 
meaning; and, second, the degree of discrimination required to distinguish 
between the correct answer and the incorrect answers, or decoys, in the 
item. The importance of this second point has often been overlooked 
with unfortunate results. Test constructors have built items to test for 
knowledge of words like ‘syzygy’ or ‘umbel.’ Such words have virtually 
no practical value except to specialists in certain learned professions; 
hence, they reduce the real validity of general vocabulary tests, but they 
have been included to provide very difficult ems in vocabulary tests 
that are not made up of items in which the decoys have been chosen with 
care and ingenuity so that they differ only slightly, though incontestably, 
from the correct meanings of the words being tested.” 

The force of this argument would appear to depend on the purpose 
for which the test is designed. We can see no objection to designing 
vocabulary tests to measure range of word knowledge at a low level of 
discrimination, as well as precision of knowledge of fairly common words. 
The same-opposite-neither test is clearly of the former type. The 
Cooperative Vocabulary Test is of the latter type. The vocabulary items 
of the Army General Classification Test fall somewhere between these 
two extremes. Examination of its item-alternatives suggests that it is 
probably more nearly a range test than a precision test. 

Comparing the three correlations between frequency and difficulty, 
there appears to be a fairly definite trend. For the precision-type Co- 
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operative Vocabulary Test the correlation is .10. For the vocabulary 
items of the Army General Classification Test, it is .47. For the same- 
opposite-neither test, it is .56. It seems reasonable to suggest, as a 
hypothesis if not as a conclusion, that the nearer a vocabulary test comes 
to being a measure of range rather than precision of word knowledge, the 
higher will be the correlation between the frequency values of its key 
words and the difficulties of its items. Moreover, the estimates of 
difficulty based on frequency can be improved markedly by the use of 
direct judgment. 
Received May 5, 1949. 
Early publication. 
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Influence of Prestige Suggestion on the Answers of 
a Personality Inventory * 


Joseph F. Donceel, Benjamin S. Alimena and Catherine M. Birch 


Fordham University 


The following investigation was inspired by an experiment performed 
in 1933 by two German psychologists, H. Kriiger and K. Zietz (1). They 
composed an artificial personality description and told each of 39 subjects 
that this description was based on a graphological analysis of their hand- 
writing and on a study of their horoscope. All the subjects accepted 
this one standard description as a good analysis of their personality; 
some were amazed at its accuracy; not a single subject rejected the 
diagnosis as a whole. 

Among the possible explanations of this surprising result, the authors 
noted: the fact that the subjects do not know their own personality struc- 
ture; their suggestibility; the vague and ambiguous character of many of 
the statements used in the personality description. 

The purpose of the present experiment was to find out to what extent 
subjects would accept as applying to them a personality description ob- 
tained by mere chance, even when the statements used in this description 
were not vague and ambiguous, and even when no effort had been made to 
avoid the contradictions which derive from a random accumulation of 
statements. 

First Experiment, Using Mild Suggestion. The subjects for the first 
experiment were 34 students in a psychology class for adults, both men 
and women, ranging in age from 20 to 55 and in education from four years 
completed in High School to two years completed in College. The 
subjects were asked to hand in a specimen of their hand-writing, and 
they were told that the experimenter would have it analyzed by a graphol- 
ogist, and would give them a detailed description of their personality, 
based on this analysis. 

In fact, the experimenter just took for each subject a Bernreuter 
Personality Inventory and answered its 125 questions at random. The 
questions were matched with the 125 first figures of a table of random 
numbers; when the figure for a certain question was even, that question 
received a “yes” answer; when the figure was odd, that question received a 

* This paper was read at the 12th International Congress of Psychology, Edinburgh 
(Scotland), July 23-29, 1948. 
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“no” answer. A week after the handwriting samples had been received, 
the Bernreuter Inventories were given to the subjects with the affirmative 
or negative answers, and the subjects were asked to check each of the 
statements, and to indicate whether they agreed or disagreed with the 
answer. 

From chance alone we expect a number of agreements averaging 50 
per cent, that is, an average agreement with 62.5 of the 125 suggested 
answers. Any number of agreements higher than 73 would occur by 
chance alone only 5 times out of 100, whereas a number of agreements 
higher than 77 would occur by chance alone only once in 100 times. 

The number of agreements of the 34 subjects ranged from 60 to 100. 
The average number of agreements was 78 with a standard deviation of 
9.41. The results of 4 subjects excluded the null hypothesis at the 5 per 
cent level of confidence, whereas the results of 15 more excluded the null 
hypothesis at the 1 per cent level of confidence. In other words, 19 out 
of our 34 subjects agreed with the suggested statements more often than 
could be explained by chance alone. They gave evident signs of sug- 
gestibility in their self-analysis. 

Second Experiment, Using Stronger Suggestion. The second experi- 
ment employed stronger suggestion. Fifty subjects were used, 25 men 
and 25 women, ranging in age from 18 to 33 and in education from two 
years completed in High School to completion of Graduate Studies. 
Here again a Bernreuter Personality Inventory was answered for each 
of the subjects by mere chance, just by rolling dice. Each of the subjects 
was given individually a Rorschach Inkblot Test and a Murray Thematic 
Apperception Test. Next, allegedly on the basis of these tests, the experi- 
menter answered orally, in the presence of the subject, the 125 questions 
of a Bernreuter Inventory (that is, gave for each question the answer 
determined by the dice) and asked the subject to tell whether or not he 
or she agreed with that answer. 

From chance alone we expect an average number of 62.5 agreements. 
The actual number of agreements ranged from 83 to 125; the average 
was 111.6 with a standard deviation of 9.16. Since chance alone is 
excluded at the 1 per cent level of confidence for any number of agree- 
ments higher than 77, the null hypothesis was excluded for every one of 
the 50 subjects. 

There was no reliable difference between the amounts of agreements 
shown by the men and by the women. For the men the average was 
112.1 and for the women 111.0. 

Every question of the Inventory was answered for the 50 subjects. 
Therefore, from chance alone, it is expected that 25 subjects will agree 
with the suggested answer to each question. Any number of agreements 
for a single question higher than 34 is significant at the 1 per cent level. 





354 J. F. Donceel, B. S. Alimena and C. M. Birch 


We obtained an average agreement per question of 44.6 with a standard 
deviation of 3.44 and a range of 35-50. Hence, for each single question, 
effective suggestion could be established at the 1 per cent level of con- 
fidence. 

If we consider each question individually for the men alone, we find 
two questions for which the number of agreements is only 17 out of 25. 
Here chance alone cannot be excluded, even at the 5 per cent level of 
confidence. These questions are: ‘“‘Do you ever complain to the waiter 
when you are served inferior or poorly prepared food?” and ‘‘Have you 
been the recognized leader of a group within the five last years?” 

The same applies for three questions of the female group: “Do you 
frequently argue over prices with tradesmen or junkmen?” (For this 
question suggestion did not work at all, the percentage of agreements 
was only 52); ‘‘Do people ever come to you for advice?”’ and ‘‘Are you 
systematic in caring for your personal property?” It will be noticed that 
these five questions are of a clearly factual nature. 

Immediately after the test with suggestion, the experimenter ex- 
plained to the subjects that the answers which had been presented had 
been obtained by a mere chance procedure and were therefore without 
value. He gave each subject an unanswered Bernreuter Inventory and 
asked him or her to answer all the questions personally. This would 
yield a measure of the endurance of the suggestion. 

Instead of finding the expected average of 62.5 agreements with the 
previously suggested answers, we found an average of 87.4 agreements, 
with a standard deviation of 10.2 and a range of 67-109. In 40 out of the 
50 subjects suggestion could still be established at the 1 per cent level of 
confidence. Only 6 subjects were able to shake off the suggestion enough 
to yield results insignificant even at the 5 per cent level of confidence. 

Third Experiment, with Suggested Reversal. In our last experiment 
the subjects were 49 sophomores attending a Liberal Arts College for 
Women. This time the subjects first answered the questions of a 
Bernreuter Inventory in the ordinary way, without any suggestion. 
Then they were given a group Rorschach Test. Four weeks later the 
experimenter met each subject individually and told her that, for a cetrain 
number of answers, the results of the Rorschach Test contradicted the 
answers given by the subject. She was asked whether she did not feel 
that the answer suggested by the Rorschach Test corresponded better 
with reality. In other words, in this experiment we did more than to 
suggest a certain answer to the subject; we tried by means of suggestion 
to make her repudiate or reverse a previously given answer and accept 
the opposite answer as the true one. 

We did not try, of course, to make the subjects change all the pre- 
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viously given answers; they would have suspected some trick. Reversal 
was attempted under suggestion for one third approximately of the 
answers, for 42 answers taken at random. Of these 42 suggestions, an 
average of 26 was accepted, or approximately 60 per cent. There were, 
of course, considerable individual differences; the lowest number of ac- 
cepted reversals was 10 per cent, the highest number was 94 per cent. 

These results are highly significant. It is true that Lentz (2) found 
that, when subjects were retested with the Bernreuter after a lapse of 
from one to four weeks, they changed approximately 20 per cent of their 
original answers. If we take that amount of change as a measure of 
the modifications which may be due to the mere lapse of time, we find 
that the 60 per cent of change discovered in our experiment yields a chi 
square of 39.67, which is considerably above the 6.64 required for signifi- 
cance at the 1 per cent level of confidence. 


Summary 


1. The questions of a Bernreuter Personality Inventory were answered 
for a group of subjects. These answers, obtained by mere chance, were 
presented as the results of psychological tests, and the subjects were 
asked to tell whether they agreed or disagreed with these answers. 

2. When mild suggestion was used, 19 out of 34 subjects accepted the 
answers more often than could be explained by chance alone. 

3. When stronger suggestion was used, 50 out of 50 subjects yielded to 
suggestion. 

4. Subjects were also induced, under suggestion, to repudiate 60 per 
cent of their own answers to the Inventory and to accept as true the 
opposite answers. 
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A Note on Kahn and Hadley’s 
“Factors Related to Life Insurance Selling’’ 


S. Rains Wallace, Jr. 
Life Insurance Agency Management Association, Hartford, Conn. 


In a recent article in this Journal, Kahn and Hadley (1) have re- 
ported a study in which it was proposed: “first, to determine the degree 
of relationship that exists between relative success in the early period 
of selling life insurance and success at a later period; second, to examine 
various selling activities with a view to uncovering certain factors which 
differentiate successful from unsuccessful agents... ; third, to in- 
vestigate further certain personal history items and personality traits 
already known to correlate with success in selling life insurance, and to 
analyze other measurable areas of personality, with the aim of increasing 
the sensitivity of existing selection methods.’”’ The authors further assert 
that “The identification of individuals for whom the likelihood of success is 
known would not only benefit management, but would, to some extent, 
minimize feelings of frustration on the part of the agent who, from the 
outset, may be doomed to failure.” 

The writer is in full accord with these aims (if dubious of “‘personality 
traits known to correlate with insurance success” and “measurable areas 
of personality’). However, he also believes that the sample of in- 
surance salesmen employed in this study was singularly ill-chosen and 
has characteristics which serve to vitiate a number of the study’s con- 
clusions. Considerable work has been done in this field (2, 3, 4, 5, 6) and 
more is in progress. It is therefore important that major findings not 
be obscured by conclusions drawn from fragmentary and inadequate data. 

Kahn and Hadley studied 84 “new life insurance agents’? who had 
attended the Purdue Course in Life Insurance Marketing. It is implied 
that these men were a random group of individuals who had just entered 
the life insurance business. This, unfortunately, is not true. Many of 
these men had sold insurance before coming to the school. Furthermore, 
there is reason to believe that the companies and agency managers in- 
volved tended to send to the school those men whom they regarded as 
most promising. This seems probable in light of the fact that many of 
the men were subsidized to some degree by companies or managers during 
the course. Certainly, there is little evidence that the group is in any 
sense representative of new life insurance salesmen in general. 
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The authors state that the salesmen represented 19 life insurance 
companies. What they neglect to mention is that life insurance com- 
panies are not homogeneous with respect to agents’ production. One 
study (5) has shown that, among 11 large insurance companies in Canada, 
the companies’ median average monthly production of agents who 
survived 12 months ranged from $5,500 to $13,700 in the first year. 
Even among 7 United States companies of equivalent size, an analysis 
of variance shows that the first-year sales production of agents who 
survive for 12 months is heterogeneous at the 1% level. 

In short, the sample employed is not relevant to the problems as 
stated, is curtailed to an unknown degree, and the criterion of success 
(sales for the duration of the school term) is contaminated by unrecog- 
nized and, with the number of salesmen involved, undetectable company 
differences. 

Most of the conclusions listed by the authors are therefore question- 
able. It is stated that the correlation between sales during the first 13 
weeks of selling and second period of 13 or more weeks is +.55. The 
statement should read during the first 13 weeks of selling after entrance 
in a school. It should be qualified by noting that the distribution is 
curtailed and that the correlation has probably been increased spuriously 
because of the effect of company differences. 

The curtailment involved in the selection of the sample must also be 
considered in interpreting the statement that only one of the four personal 
history items investigated differentiates significantly between successful 
and unsuccessful life insurance salesmen. If the authors had employed 
more cases drawn from a sample of truly “new” agents and avoided 
widespread criterion categories, they would have found that age at entry 
has a significant, but curvilinear, relation to a success criterion (5, 6) 
and that minimum monthly income required has a similarly curvilinear 
and significant relation (6). They would also have found that agents 
with no dependents are significantly inferior to others in their first-year 
performance (6). 

The conclusions concerning the differentiation of the criterion groups 
by various test items and total test scores is, of course, open to the same 
criticisms. Furthermore, the implication that this work is of value in the 
“fdentification of individuals for whom the likelihood of success is 
known” and, therefore, in the selection of life insurance agents, becomes 
highly suspect if it is remembered that many of these individuals were 
tested when their life insurance careers were well under way. The fact 
of success or failure may be a powerful determiner of test responses. 

The problems of sampling, of restriction of range, and of criterion con- 
tamination are as real in investigations of the salesman as in any other. 
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Studies in which these problems are unrecognized or ignored can serve 
only to introduce further inaccuracies into an already confused field. 
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A Comment on Wallace’s Note on 
‘Factors Related to Life Insurance Selling” 


J. M. Hadley and D. F. Kahn 
Division of Education and Applied Psychology, Purdue University 


It would appear that Wallace (9) is quite concerned that readers will 
misinterpret a recent article by Kahn and Hadley (3). Careful examina- 
tion of the article in question will reveal that no generalizations are 
offered. The opening paragraph of the section entitled “Summary and 
Findings” on page 138 reads as follows: “Based solely on the criterion of 
written business, and pertaining only to those particular life insurance 
salesmen investigated in this study, the following conclusions may be 
drawn.” All references in this section are to differences which ‘‘were 
found” to exist within the group of salesmen studied. No predictions 
were made concerning results which might be obtained from other 
samples. It is difficult for the writers to understand how “inaccuracies 
can be introduced into an already confused field’’ if research reports are 
read objectively and unintended generalizations are not inferred from 
admittedly ‘fragmentary and inadequate data.” 

Several of Wallace’s points will be considered separately: 


1. Wallace believes that ‘the sample of insurance salesmen employed 
in this study was singularly ill-chosen and has characteristics which serve 
to vitiate a number of the study’s conclusions”. The sample may be 
inadequate in many ways. It would be excellent if an entirely unselected 
sample could be obtained. It is doubted if such an entirely unselected 
sample was ever studied. The samples of recruits considered in the 
excellent studies by the Life Insurance Agency Management Association 
(4, 5, 6, 7, 8) are undoubtedly more adequate than those studied by Kahn 
and Hadley. Certainly, interest in being recruited also biases the samples 
studied by the Association to an unidentified degree. However, it is 
maintained that inadequacies inherent in the sample do not vitiate the 
conclusions concerning differences within the group. 

2. Wallace criticizes the designation of the subjects of the study as 
“new life insurance agents.” He also states that ‘many of these indi- 
viduals were tested when their life insurance courses were well under 
way.” A careful recheck of the data indicates that 95 per cent of the 
sample had not sold insurance before coming to the Purdue Life Insurance 
Marketing School. Actually, only four of the original 84 beginning 
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students had ever sold insurance. Two subjects had sold, or attempted to 
sell for longer than three months: one for nine months, and one for two 
years. The experimenters did not intend to include any subjects re- 
porting more than three months’ experience. Apparently two subjects 
were included by error. Shortly after the data were collected, the school 
began to require a minimum amount of experience. This was not true 
at the time the study was conducted. Furthermore, the original intent 
was to gather information of value to the school. In line with that 
purpose, it is submitted that the best sample would be classes of students 
in that school. Consequently, the new agents in classes I and II were 
selected. It is agreed that the sample is pre-selected by their companies 
and agency managers. The subjects may not be “in any sense repre- 
sentative of new life insurance salesmen in general’’ but they are repre- 
sentative of the first two classes attending the school. Again it is em- 
phasized that the conclusions are limited to this group. 

3. It is unfortunate that Kahn and Hadley in their condensed pub- 
lished article neglected to recognize the lack of homogeneity between 
life insurance companies and other complexities of the problem. Kahn 
(1, 2) in the original thesis has discussed the complexity of the problem 
at length. 

4. Wallace states “most of the conclusions listed by the authors are 
therefore questionable.””’ For some of the reasons which he states gen- 
eralizations would be questionable, but one cannot question conclusions 
and results of a specific research study without questioning the integrity 
of the research workers. The writers accept the suggestion that the 
statement on page 135, line 7, of the results should read, “‘. . . during the 
first 13 weeks of selling after entrance in the school.”’ It should be noted 
that the word “a” as suggested by Wallace has been changed to ‘‘the”’ 
by the writers. It would be interesting from the standpoint of re- 
search methodology to discover whether the effect of curtailment on 
the distribution and the effect of company differences, as discussed by 
Wallace, would increase or decrease the size of the reported coefficient 
of correlation. 

5. Wallace seems to be disturbed that Kahn and Hadley did not 
obtain the same results as were obtained in several studies which he has 
reported. With a larger sample it is entirely possible that many differ- 
ences would have been found to be more highly significant. On page 
136 it is reported that age at entry, number of dependents, and minimum 
living expenses per month showed a positive relationship to the criterion. 
Dichotomies made in the range of each of the above-mentioned personal 
history items offer a means of showing the relationships between these 
measures and the criterion. Thirty-one agents of 30 years of age and 
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above averaged a mean weekly production of $5905 as contrasted with 
an average mean weekly production of $4612 for the 47 agents at age 29 
or below. A similar average for 32 agents claiming two or more de- 
pendents was $6303 per week in comparison with $4307 per week for 
46 agents claiming one or no dependents. For 13 agents requiring 
$280 a month or more as a minimum living expense the average mean 
weekly production was found to be $6554 as contrasted with $5022 for 59 
agents requiring below $250 for living needs. Further manipulation of 
the data was not attempted because of the recognized inadequacy of 
the sample. Apparently the results obtained do tend to confirm those 
discussed by Wallace. 

6. Wallace’s criticism of the conclusions concerning the differentia- 
tion of the criterion groups by various test items and total test scores 
are, as previously discussed, again not considered relevant to the results 
but would be relevant to generalizations based on them. 

7. Finally, Wallace states that the fact of success or failure may be 
a powerful determiner of test responses. This is granted, particularly in 
reference to the preference tests and to a lesser degree, personality tests. 
It is doubtful if it affects intelligence or biographical data. However, if 
test responses at any stated level of experience have any predictive value 
for future success, then they have validity for that purpose. In this con- 
nection some of the results of the study in question are indicative of the 
need for further research. 


The writers gather from the general tone of the note that Wallace feels 
research in this area has been retarded and confused rather than advanced 
by the publication of the study being discussed. It is doubtful whether 
any research problem can be clarified by withholding legitimate data 
(and all data are legitimate) even though the population from which the 
data are derived is inadequate. Even single case datum is sometimes 
valuable. We must use care not to generalize beyond the scope of 
the data. 

The writers would like to take this opportunity of urging that the 
valuable research by workers in the life insurance field be published in 
the scientific psychological journals so that it will be more readily avail- 
able to academic research workers. 
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Instrument Reading. I. The Design of Long-Scale Indicators 
for Speed and Accuracy of Quantitative Readings * 


Walter F. Grether 
Aero Medical Laboratory, Wright-Patterson Air Force Base, Dayton, Ohio 


Quite a number of instruments used in aviation and elsewhere must 
be read with precision greater than can be provided by one revolution of 
a pointer on a circular dia! of conventional size. There is considerable 
accumulated evidence that, except for the direct reading counter, most 
of the devices that have been used to increase effective scale length result 
in instruments that are very difficult to read. In a previous study by the 
author (2) on the design of clock dials, it was found that as common an 
instrument as a clock is quite difficult to read. Even the best clock 
designs required approximately 5 seconds (including recording time) for 
readings in hours and minutes by Air Force pilots. Even with this time 
spent on each reading, about 7 per cent of the readings on the better 
clocks were in error. 

Aside from such laboratory data there is considerable evidence of 
instrument reading difficulties in the practical situations where these 
instruments are used. In a study of actual errors made by pilots in 
reading aircraft instruments carried out by Fitts and Jones (1), multiple- 
pointer or long-scale instruments provided the greatest number of serious 
cases of instrument misreading. The instrument reported as being 
misread most frequently was the altimeter. In the typical report the 
altimeter was read too high by a complete revolution of the most sensitive 
pointer, that is by 1000 feet. A tachometer designed with a rotating 
sub-dial to indicate RPM in thousands was likewise read too high by 
1000 RPM. Numerous fatal and non-fatal accidents have been attrib- 
uted directly to such instrument reading errors, and without doubt 
many of the unexplained crashes resulted from similar human failures. 

The major purpose of the present investigation was to make a direct 
comparison in terms of speed and accuracy of quantitative readings of 
several of the possible methods of obtaining increased scale length on 
instruments. The experiment also had a secondary but more specific 
and practical purpose of finding improved methods of indicating altitude 

* The data presented in this paper have been previously reported in Memorandum 


Reports No. TSEAA-694-14 and MCREXD-694-14A of the Aero Medical Laboratory, 
Engineering Division, of the USAF Air Materiel Command. 
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in aircraft. For this reason all of the instruments were designed to read 
altitude in feet and all readings were made in feet as units. 

It is emphasized that the evaluation of the different indicator designs 
in this investigation was with respect to the speed and accuracy of 
quantitative readings. Actually this is only one of several criteria which 
most instruments should be required to satisfy. It has been pointed out 
by the author (3) that in aviation in particular there would appear to be 
at least three major ways in which instruments may be read, depending 
upon the purpose of the reader. These three types of reading may be 
categorized as follows: a. Check reading—for assurance of a null, normal, 
or desired indication; b. Qualitative reading—for the direction and 
approximate magnitude of a deviation from a null, normal, or desired 
indication; and c. Quantitative reading—for the numerical value of an 
indication. 

The above categories of instrument reading have considerable utility 
as criteria against which to evaluate different instrument designs. It is 
usually possible from a knowledge of the situation in which an instrument 
is to be used to decide the reading purposes or criteria which it is most 
necessary to satisfy. The criteria against which an instrument is to be 
evaluated then provide operational definitions of the experimental meas- 
urements to be made. As mentioned earlier, the experimental indicators 
in this investigation were evaluated only with respect to the third cri- 


terion, quantitative reading. In this study, furthermore, there was no 
concern with small errors of interpolation, only with larger errors re- 
sulting from assignment of incorrect values to graduation marks. 


Experimental Procedure 


Nine experimental altitude indicator designs were used in this investigation. 
These are shown along with some of the results in Figure 1. The first of these 
indicators, design A, is a simulation of the altimeter almost universally used in 
military and larger commercial aircraft. On this instrument the longest 
pointer gives readings in hundreds of feet, the broad pointer is read on the 
same scale in thousands of feet, and the small pointer is read on the same scale 
in ten-thousands of feet. Altimeter designs B and C also simulate existing 
but not commonly used types. 

Altimeter design D uses a single pointer to indicate altitude in hundreds 
of feet. This pointer makes one revolution for each 1000 feet change in altitude 
and the multiples of 1000 feet are indicated on a simulated direct reading 
counter. This counter has two drums, one for 1000-foot and the other for 
10,000-foot increments. It is assumed that the motion of these drums would 
be intermittent and that single whole numbers would always be showing. 

In design E, also, only one pointer is used, but two dials rotating behind a 
window indicate the multiples of 1000 feet. In this design the motion of the 
dials showing through the window is assumed to be continuous rather than 
intermittent, thus permitting more than one number (or half numbers) to 
appear. 

Design F indicates altitude in quite a different manner from the other 
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instruments. In this display the pointer is assumed to make only one revolu- 
tion to cover the entire altitude range. The range being covered is indicated 
in the window as 0—1000 feet, 0—-10,000 feet, or 0-100,000 feet. The meaning 
of the numerals on the dial graduations is, therefore, determined by the range 
indicated in the window. This indicator is similar in principle to a radio 
altimeter now in use. It is obvious that the precision of indication on such 
an instrument decreases as the range being covered increases. 

Altimeter designs G and H are similar in that they simulate a scale moving 
vertically behind a window. An instrument following design G could use 
either an endless tape or drum to present the moving scale, with a counter to 
indicate multiples of 1000 feet. An instrument using design H would require 
a very long tape with a scale covering the desired altitude range. 
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Fig. 1. Speed and accuracy in reading altitude from different types of instruments. 


The last experimental design, I, simulates a simple direct reading counter 

without any moving pointer or scale. For reasons pointed out later in the 
discussion of results, such an indicator would probably be unsatisfactory for 
the pilot, but might be suitable for other aircrew members such as the navi- 
gator. One of the major reasons for including it in this study was to get an 
approximate measure of the time required to copy a series of numbers repre- 
senting an altitude reading, it being assumed that no interpretation time would 
be involved in reading altitude from this type of indicator. 
‘«#For each of the altimeter designs used in this experiment a test booklet 
was prepared. The cover (page 1) of each booklet presented the experimental 
subject with detailed instructions for reading the dial design in that booklet, 
and a sample dial for the subject to read. On the two inside pages, 2 and 3, 
the dial design was reproduced with 12 different settings. Under each picture 
was a space for writing in the reading.! 


1 The large number of drawings needed for the nine test booklets were produced by 
Miss Mary Cowles of the Psychology Branch with the photographic assistance of 
Mr. D. M. Penrose of the Laboratory Services Unit of the Aero Medical Laboratory. 
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Special precautions were taken in the preparation of the drawings and choice 
of altitude settings to be used in the various test booklets to prevent biasing 
the results for or against any of the indicator designs. The circular dials were 
214 inches in diameter. From this other dimensions can be estimated from 
Figure 1. All essential numerals and graduation marks were sufficiently large 
and distinct to be easily legible. Except for the inner dials on designs B and 
C all scales were alike in having numerals at all 100-foot graduations with 
intervening marks at 20-foot intervals. Other factors equalized were the 
number of settings above and below 10,000 feet, the number of sensitive 
pointer settings on 100-foot graduation marks, the number of sensitive pointer 
settings just preceding and just following the zero on the scale, and the number 
of sensitive pointer settings on the left and right halves of the dial. Precau- 
tions were also taken to be sure that no essential information was hidden by 
any of the hands, and that the interrelationships between pointer positions 
were correct. For indicator design F some of the settings were midway 
between graduation marks. For the remaining designs the sensitive pointer 
(or reference mark) was always on a graduation mark. Thus, no interpolation 
was required to obtain correct readings. 

The altimeter reading test was taken by 97 USAF pilots in the Instrument 
School at Barksdale Field, Louisiana, and 79 college men (without aircrew 
experience) at Denison University, Granville, Ohio. In administering the 
test, the booklet for only one altimeter design was passed out at a time, and 
sufficient time was allowed for reading the instructions and working the sample 
item. Ata signal all subjects opened the booklet and worked until completing 
all items. Each subject’s completion time was recorded on his booklet. 
Four sequences for administering the nine test booklets were used in order to 
counterbalance for learning effects. An approximately equal number of sub 
jects (in each of the two subject groups) took the test in each sequence. 

The two subject groups of dissimilar experience were used in order to get 
some measure of the effect of experience on the ability to read the various dial 
designs. All of the USAF pilots can be assumed to have spent several years 
flying with altimeter design A, and possibly some experience with designs B 
and C. The college men can be assumed to have had little, if any, experience 
in reading altitude from any type of indicator. In general intelligence and 
education the two groups were very similar. 


Results of Comparisons Among Indicator Designs 


The data obtained in this investigation were analyzed to determine 
the frequency of errors and the time per instrument reading. These 
results are shown in Table I which gives the per cent of total readings 
in error for the nine indicator designs.2, None of the errors included 
in this table resulted from inaccuracies in pointer interpolation since all 
settings of the sensitive pointers were on graduation marks (except for 
design F which had some settings midway between marks). 

Data on speed of reading are also shown in Table 1. It will be re- 
called that the subjects wrote their answers in the test booklets and the 
time for completion was recorded in each instance. The average time per 
reading could thus be computed from the total time and the total number 

2 Altimeter reading errors during actual flight probably occur with much lower 


frequency than found in this study, since in flight the pilot can anticipate the approxi- 
mate readings. 
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Table 1 








USAF Pilots, N = 97 College Men, N = 79 
Altitude 
Indicator Per Cent Seconds per Per Cent Seconds per 

Design Errors Reading* Errors Reading* 
(a) (b) (c) (d) 











15.9 9.6 20.8 9.8 
15.0 8.6T 17.9 9.6 
8.3T 7.3f 11.4f 7.6T 
3.5T 4.2 2.1f 4.1f 
17.3 8.8 15.3f 9.2 
24.1 8.7t 21.0 8.3t 
2.1f 4.8T 3.0t 4.27 
2.5f 4.2T 4.57 4.2f 
0.6f 2.5f 0.3 2.3f 


Confidence 
level 





Correlation between speed and accuracy 
for different designs: 

For pilots (columns a and b) 9 designs 1% 

For college students (columns ¢ and d) 9 designs , 1% 


Correlation between pilots and college 
students on different designs: 
Per cent errors (columns a and c) 9 designs ; 1% 
Seconds per reading (columns b andd) ‘9 designs ; 1% 


Correlation between speed and accuracy of 
individuals on all designs: 
Pilots 97 pilots 38 1% 
College students 79 college students 44 1% 





* Reading time included time for subject himself to record answer. 
t Indicates statistical significance (at one per cent level of confidence) of superiority 
over conventional altimeter (design A). 


of items, but this time included the time for recording as well as for 
reading or interpreting the instrument. 

A reproduction of each of the experimental indicator designs accom- 
panied by graphic illustrations of the more significant findings is provided 
in Figure 1. The upper pair of bars under each indicator shows the 
per cent of errors equal to or exceeding 1000 feet for the two groups of 
subjects. The lower pair of bars gives the computed interpretation time 
for each of the two groups of subjects. An estimate of the time for inter- 
pretation only was obtained by subtracting from the average time for 
each design the average time for design I (the direct reading counter). 
The reading of altitude from design I involved the mere copying of the 
numbers shown, and hence was assumed to require no interpretation. 
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Discussion of Results 


Indicator Designs A, B, and C. The results of this investigation, as 
shown in Figure 1 and Table 1, show that Design A, which simulates the 
conventional altimeter, is a very difficult instrument from which to ob- 
tain quantitative readings as required in this study. Even the pilots, all 
of whom had spent several years flying with this instrument, spent more 
time per reading on this indicator than on any of the other designs 
studied. Only one of the remaining eight indicators, design F, resulted 
in a higher proportion of errors. It must be concluded that it is a very 
difficult task to combine into a single numerical value the readings of 
three pointers indicating on a single scale, as required in reading the con- 
ventional altimeter. Designs B and C apparently were slightly easier 
to read than design A. 

Indicator Design D2 This indicator uses only one pointer, with the 
1000-foot and 10,000-foot indications provided by a counter. Such a 
combination proved to be very easy to read. For USAF pilots the per 
cent of total errors was very low, 3.5 per cent, and only 1.7 sec. was 
required for interpretation (as contrasted with 15.9 per cent and 7.1 
sec. for the conventional altimeter). More significant, perhaps, is the 
finding that only 0.7 per cent of the readings erred by more than 1000 feet. 
Most of the errors in reading indicator design D resulted from assigning 
10 feet instead of 20 feet to each of the graduation intervals between 
numerals. 

Indicator Design E. The substitution for two of the pointers on the 
altimeter of two rotating dials appearing through a window appears to 
have no advantage. This indicator was designed so that under most 
circumstances only one number would appear on each of the two rotating 
dials. But if such dials rotate continuously (rather than intermittently) 
during altitude changes, as assumed in this test, it is inevitable that at 
certain settings two numbers will be equally visible. Such indications 
are very difficult to read correctly. 

Indicator Design F. On this indicator the range covered by the indi- 
cating pointer and scale is dependent upon range limits shown in the 
window. The high proportion of errors and slow reading time suggest 
that the required changes in interpretation of the primary scale are 
difficult for human beings to carry out. 


3 On the basis of this study indicator design D, combining a sensitive pointer with a 
direct reading counter, was recommended as a replacement for the existing three-pointer 
altimeter. As a consequence the Kollsman Instrument Division of the Square D Com- 
pany is now developing such an altimeter. Two other items of aviation equipment 
currently being developed by the Air Force, an absolute (radio) altimeter and an air- 
borne distance measuring device, are also incorporating this type of indication. 
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Table 2 


Frequency of Various Types of Error Made by 97 USAF Pilots and 79 College Students 
in Reading the Conventional Three-Pointer Altimeter 








Per Cent of Total 
Readings in Which 
Error Appeared 





— College 
Description of Error Pilots Students 





Reading to nearest numeral instead of to 
lower adjacent numeral— 
(Failure to consider more sensitive pointer) " 0.09 
2.58 
1.72 
4.39 
Reading to lower adjacent numera] when 
nearest numeral is correct— 
(Failure to consider more sensitive pointer) 100 Ft. 
1,000 Ft. 
10,000 Ft. 


Total 
Displacement of digit in number series— 
(Interchange of digit with adjacent zero) 20 Ft. 
100 Ft. 
1,000 Ft. 
10,000 Ft. 


Total 


Misreading of scale or numera]l— 
20 Ft. 
100 Ft. 
1,000 Ft. 
10,000 Ft. 


Total 





Omission of one pointer— 
100 Ft. 
1,000 Ft. 
10,000 Ft. 


Total 


Pointer exchange— 
100and 1,000 Ft. 
100 and 10,000 Ft. 
1,000 and 10,000 Ft. 


Total 


Repetition of reading on one pointer— 
Complex and unclassified errors 
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Indicator Designs G and H. The vertical moving scale instruments 
proved to be easy to read in this experiment. The virtues of such in- 
struments for check reading and qualitative reading were not evaluated 


in this study. 
Indicator Design I. This indicator, which simulates a simple Veeder 


counter, was read with greatest speed and accuracy of all the indicators. 
This would suggest that where only quantitative readings are to be 
provided this would be the most desirable type of instrument. It is 
believed that for check reading and for qualitative reading such an 
instrument would be inferior to one using a moving pointer. 


Types of Error in Reading the Three-Pointer Altimeter 


Because of the widespread use of the three-pointer altimeter, and 
because of the frequent use of this type of multiple pointer indication for 
other purposes, it seemed worth-while to make a more detailed analysis of 
the types of errors made in quantitative readings of this instrument. 
This analysis was based on the same data that have already been sum- 
marized in Table 1 and Figure 1. It will be recalled that 97 USAF pilots 
and 79 college students each made 12 readings on the three-pointer alti- 
meter. This gave a total of 1164 readings by pilots and 948 by non- 
pilots. 

The detailed classification of errors into types and sub-types is shown 
in Table 2 along with the per cent of total readings in which each occurred. 
Two or more types or sub-types of errors were in some cases charged 
against a single erroneous reading. For this reason the figures in the per 
cent columns total up to more than the total per cent errors as reported 
in Table 1. For detailed descriptions of all the error types, and the 
assumed mental processes which led to the incorrect answers, the reader 
is referred to Aero Medical Laboratory Memorandum Report No. 
MCREXD-694-14A. 


Discussion 


In an experiment such as this a number of questions arise with regard 
to the suitability of the criterion measures which have been used and with 
regard to the effect of the subject group upon the results. For this rezson 
there have been included in Table 1 a number of correlation coefficients 
which bear on these problems. 

A serious question facing the experimenter is the effect of the experi- 
ence of the subject group upon the validity of the findings. In the 
present experiment two subject groups were used which represented ex- 
tremes in experience as related to the task being performed. All USAF 
pilots had had considerable experience in reading one of the experimental 
indicator designs along with general experience in reading aircraft instru- 
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ments. The college students, on the other hand, included no pilots or 
other military air crew members. In spite of this difference in back- 
ground of experience the two groups gave highly similar results as indi- 
cated by a correlation between the two groups of .95 on per cent errors 
and .99 on seconds per reading. This would suggest that the previous 
experience of the subjects is of relatively minor importance in an experi- 
ment of this type. 

In the present experiment neither speed nor accuracy of response were 
controlled, thus making possible two independent criteria for evaluation 
of the different dial designs. In Table 1 the correlations between speed 
and accuracy for the different dial designs are .91 for pilots and .95 for 
college students, indicating very high agreement between the two criteria 
for goodness of the several indicator designs. Or stated differently, the 
indicator designs which were read most rapidly were also read most 
accurately. Correlation coefficients between speed and accuracy of in- 
dividuals for all designs are also positive, but much lower, .38 for pilots 
and .44 for college students. These values indicate, however, that in 
general the individuals who read the indicators most rapidly also read 
them most accurately. In a previous study by the author (2) on clock 
dial designs the correlation coefficients were likewise positive, but some- 
what lower in magnitude. 

In two previous experiments on instrument design by Loucks (4) and 
Sleight (5) a somewhat different technique was used in that the instru- 
ment exposure time was controlled tachistoscopically and only accuracy 
of reading was measured. Such a technique might be expected to force an 
increased error rate and thus accentuate the differences between indicator 
designs. It is the belief of the author, however, that such a control of 
exposure does not constitute control over response time, but serves 
rather to restrict the number of visual fixations of the displayed material. 
The actual response may be delayed for several seconds during which 
the subject retains a mental image of the indicator scale and pointer. 

It is quite possible that in the experiment of Sleight (5) the use of a 
controlled exposure time which did not permit a change in the prepara- 
tory eye fixation led to erroneous findings. It is believed that this tech- 
nique favored the fixed pointer indicators on which the subject was able 
to anticipate the location of the pointer. The two fixed pointer indicators 
in the present study, designs G and H, showed no general superiority 
over the only comparable moving pointer indicator, design D. 


Summary 


An evaluation was made of the speed and accuracy with which 
quantitative readings could be made of nine experimental altitude indi- 
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cators. The results are considered to apply also to other types of 
quantitative indication which require very great scale length. Evalua- 
tion of the various indicator designs was made by having 97 USAF pilots 
and 79 college men read 12 settings on each instrument. The instru- 
ment faces were reproduced in test booklets which provided spaces for 
writing in the readings. Both accuracy and speed-of-reading data were 
obtained for each of the nine indicator designs. 

The major conclusions indicated by the results of this investigation 
are as follows: 


1. The combining into a single numerical value of the indications 
from two or more pointers, or from a pointer and rotating subdials, is a 
relatively difficult task for human beings. Such instruments are con- 
ducive to very large errors in reading. 

2. The ease with which long scale indicators can be read quantita- 
tively appears to depend upon the extent to which the digits are already 
combined in the proper sequence by the instrument. 

3. A multiple pointer instrument such as the altimeter with contin- 
uous motion of the non-sensitive pointers is frequently read too high by 
a complete revolution of the sensitive pointer. 

4. The speed and accuracy of instrument reading are positively 
correlated, indicating that gains in reading speed can normally be ex- 
pected to improve accuracy also. 


5. College men without altimeter reading experience showed virtually 
the same pattern of results in this study as highly experienced USAF 
pilots, suggesting that instrument reading difficulties are quite basic in 
nature and not readily modified by experience. 


Received October 25, 1948. 
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Types of Errors in Location Judgments on Scaled Surfaces. 
I. Errors of Configuration * 


Adelbert Ford 
Department of Psychology, Lehigh University 


Many instruments have been so designed that an operator is required 
to locate the position of a signal on a flat area with reference to a super- 
imposed system of scales, which may be either polar or rectangular. This 
study deals with the latter. The nature of this signal may be a small 
dot, or a white patch of small dimensions, which appears suddenly and 
must be reported for its elevation on the y-axis and its horizontal location 
on the x-axis. This is typical in using some of the types of radar cathode- 
ray scopes. 

In practice there are two systems for keeping such a signal or spot 
located. 1. A transparent plastic scale, engraved with suitable reference 
lines, may be placed over the area, the operator locates the proper line, 
follows to the end, and notes the position between engraved numerals, 
interpolating between lines when necessary. 2. The operator may be 
provided with a “tracker” which he pushes around the face of the area, 
keeping it superimposed on the signal, and this mechanism registers the 
x-and y-values on remotely located meters. The latter has been found 
to be objectionable because it usually required two operators, and it is 
mechanically complex. However, the simple method of using scaling 
assistance may involve intolerable errors, greater mental concentration, 
and therefore it is desirable to know just what kinds of errors an average 
operator does commit in using scaling assistance, on the basis of quantita- 
tive experimentation. If these types of errors are found to be intolerable, 
then it is worth while to pursue such engineering design as may eliminate 
the human error in scale reading methods. 

The present study deals with the first of four types of reading errors 

* This research was executed under Contract No. W28-099-ac-130 between the 
Institute of Research, Lehigh University, and the USAF Air Materiel Command, 
Watson Laboratories, Red Bank, N. J. The investigation was made to ascertain the 
accuracy of radar operators in the interpretation of scope signals. 

The author wishes to thank the psychologists on the staff of the Aero Medical 
Laboratory, Psychology Division, Wright-Patterson Air Force Base, Dayton, Ohio, 
for suggestions concerning the equipment area needing study, and the officers and 


psychologists of the Strategic Air Command, Andrews Field, Washington, D. C., for 
field facilities in securing typical operating records. 
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on scaled surfaces. This will be called errors of configuration because we 
shall show that the configuration or shape of the field produces systematic 
errors in one part of the field as contrasted with errors in another part 
of the field. 

A sector scope is essentially a triangular area. This sets the condition 
for the perspective illusion. Objects near the apex of the triangular space 
appear larger than at the open end of the triangle. It would be expected 
that elevation judgments, with respect to a zero line of reference would 
be correspondingly exaggerated. The questions are: How much? Are 
all people susceptible? 

There are many citations of general principle in the literature. 
Ponzo (1) showed the principle with respect to estimated lengths of 
lines. Kohler and Wallach (2) maintained that space estimations at 
the open end were underestimated while those at the apex were over- 
estimated. 


Method 


1. Artificial Series. The types of scope faces used in the artificial 
series are presented in Figures 1 to 6. The figures on the left are for the 
sector-type of radar scope, commonly used, and show the condition for 
the perspective illusion. The figures on the right are rectangular pres- 


entations used as a “control’’ with the same kind of problems. All scope 
pictures were 7 inches in diameter, viewed at a distance of 16 inches, or (in 
group experiments) with an equivalent visual angle. 

2. Natural Series. Figure 7 exhibits a photograph of a real radar 
sector-type scope, one of the stimulus series which we presented with the 
ultra-violet radar simulator. The white spot at the right is a signal from 



































Fic. 1. Sector “unscaled”’ scope. Fie. 2. Rectangular “unscaled’’ scope. 
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Fic. 3. Sector scope with 100-foot Fic. 4. Rectangular scope with 
side lines. 100-foot side lines. 





























Fig. 5. Sector scope with multiple scaling. Fic. 6. Rectangular scope with 
multiple sealing. 


an approaching airplane about to land. In this series it was impossible 


to use the rectangular scope for comparison, because no scopes were 
made that way. 


All projected images were on a phosphorescent radar screen, of 
typical color and signal persistence, except in the group experiments. 
Signals were presented serially in a fairly realistic rate of progression. 
All signals were white on a black field. 

Subjects were scored for error and reaction time by recording verbal 
answers as rapidly as made, and using a chronoscope actuated by a 
voice key. 

There were three types of scaling assistance, as indicated in Figures 
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Fig. 7. Photograph of sector presentation on a real 
radar scope used for ground-control-approach. 


1 to 6. (a) The “unscaled”’ scope showed merely a zero line of reference 
with a marginal standard for space values (Figures 1 and 2). (b) The 


” 


“‘100-foot scope’”’ used the same zero reference line, but had a parallel 
line, above and below, to show 100 feet of signal deflection. The lines 
representing 100 feet were 0.4 inch above and below the zero line (Figures 
3 and 4). (c) The scope with “multiple scaling” had fine lines, 0.1 inch 
apart, with every fourth line heavier. Fine lines represented increments 
of 25 feet. Heavy lines represented 100 feet deflection increments 
(Figures 5 and 6). 
Results ! 

It will be necessary to remember that these experiments were run on 
rather complex equipment, generally one run at a time, and that the 
number of readings possible was thus restricted, and the number of 
subjects used was necessarily limited. All differences were subjected to 
calculations of critical ratio, and the differences in means with a signifi- 
cance better than the one percent level have been shown in italics in all 
tables. It will be obvious that statements made under conclusions are 
qualified. 

1 Expanded tables, in much greater detail, have been reported by A. Ford and M. G. 


Getz, The Perspective Illusion in Radar Sector Scopes, Technical Report No. 1, Contract 
W28-099-ac-130, Watson Laboratories, Air Matericl Command, USAF, 10 June 1948. 
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It was not possible to keep subjects completely unsophisticated with 
respect to the existence of the illusion, during the 18 months of work. 
Therefore we shall show training effects by separate tables. 

1. Untrained Subjects on Artificial Scopes. The introductory or 
trial experiments were done with the artificial scope pictures of Figures 
1 to 6. The trend or “drift’’ of possible overestimation compares the 
one-third area of the open end of the sector with the one-third area at 
the apex end of the scope field. This trend is reduced to a single figure 
expressed in percentage of over or underestimation. A similar problem is 
then shown for the same subject on a rectangular scope, calculated in the 
same manner, and used as a control. Presentations were randomized. 


Table 1 


The Perspective Illusion in Untrained Subjects 
Percentages of Error in Elevation Judgments 


Individual Experiments, Artificial Scope 
Combined Data for Four Subjects 
Plus signs indicate over estimations; minus signs, under estimations. 





Sector Scope Rectangular Scope 
(Experimental) (Control) 











Right Difference 
Type of Scaling Left (Open (Illusory 
Assistance (Apex) Center End) Left Center Right Trend) 





Unscaled +19.6  —9.8 —.01 —1.1 —3 +423 +23.1 
100 ft. Lines +14.0 —.3 +2.4 +8 +3.1 —.4 +10.4 
Multiple Lines —4.5 +2.8 +1.1 +.1 +2.5 +1.5 —4.2 





Note: Difference values expressed in italics are better than the one per cent level of 
significance. Values are averages based on 40 to 55 runs. 


The first, or introductory runs, showed a strong illusory trend of over- 
estimation at the apex of the sector scope (Table 1), when the field was 
“unsealed.”’ The introduction of a pair of 100-foot reference lines (0.4 
inch on each side of the line of zero position) had a marked effect in re- 
ducing the illusion, but did not eliminate it. When the multiple scaling 
system was used (Figures 3 and 6) there was no reliable evidence of 
illusory effect. 

2. Partially Trained Subjects on Artificial Scopes. After 40 to 55 
runs, the next section of the training series (Table 2) showed a reduction 
in the amount of the illusion on the unscaled scope, and no significant 
illusory trends for the scopes with 100 foot scaling lines, or the multiple 
scaling system. Evidently scaling reduces the illusion. 

3. Individual Differences. Dealing with the spread of “random 
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Table 2 
The Perspective Illusion, Second Training Stage 
Percentages of Error in Elevation Judgments 
Individual Experiments, Artificial Scope 
Combined Data for Four Subjects 


Plus signs indicate over estimations; minus signs, under estimations. 








Sector Scope Rectangular Scope 
(Experimental) (Control) 
Right Difference 
Type of Scaling Left (Open (Illusory 
Assistance (Apex) Center End) Left Center Right Trend) 





Unscaled +118 469 —2.7 —-2.0 —7.1 +.7 +17.2 
100 ft. Lines +3.9 +3.1 +4.1 —-.3 42.1 +.7 +.8 
Multiple Lines -3 -—18 +.6 +.8 +4 +41.7 —1.8 





Note: Difference values expressed in italics are better than the one per cent level of 
significance. Values are based on 55 to 109 runs. 


errors’ all subjects were very much alike, showing standard errors closely 
similar. However, with respect to the illusion, it had become evident 
that some subjects have strong susceptibility, and an occasional subject 
seems to be completely free from any proneness. Table 3 is presented 
to show individual differences. One subject entered too late to be given 
the runs on the unscaled scope, but is included to complete the data. 


Table 3 
The Perspective Illusion, Second Training Stage 
Percentages of Error in Elevation Judgments 
Individual Experiments, Artificial Scope 
Data for Individual Differences 


Plus signs indicate over estimations; minus signs, under estimations. 








Unscaled 100-ft. Refer- Multiple 
Scope ence Lines Scaling 
Rec- Differ- Rec- _—_— Differ- Rec- _ Differ- 
Sector tan- ence Sector tan- ence Sector  tan- ence 
Initials Scope gular (Illu- Scope gular (Illu- Scope gular (Illu- 
of (Experi- (Con- sory (Experi- (Con- sory (Experi-(Con- — sory 
Subjects mental) trol) Trend) mental) trol) Trend) mental) trol) Trend) 





mn. J. RB. +27.1 —-15 +28.6 +3 —1.4 +1.7 —-9 +50 -—5.9 
. Fs Bs. +16.9 —7.8 +24.7 +.6 —2.1 +2.7 +3 -—-28 +3.1 
L. A. A. +123 -10 +133 419 —.2 +2.1 —4 —-29 +2.5 
D. M. S. +3.2 +1.6 +6 -9.0 +10 -100 -—-23 -—-3 -—2.0 
W.A.S. (New Subject) +2.2 —5.5 +7.7 —1.7 -—41 +2.4 





Note: Difference values expressed in italics are better than the one per cent level of 
significance. Values are based on from 13 to 25 runs. 
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This is a breakdown from data in the series for partially trained subjects. 
Three subjects are quite strongly susceptible. One is not. 

This led us into an experiment in which the same problems were 
projected on a large screen for a group experiment on untrained subjects, 
with visual angle kept approximately the same. Table 4 arranges 
these subjects in an order of most susceptible to least susceptible. The 
purpose of this section was to secure more extended data on individual 


Table 4 
The Perspective Illusion in Untrained Subjects 
Percentages of Error in Elevation Judgments 
Group Expe;iments, Artificial Scope 








Unscaled 100-ft. Refer- Multiple 
Scope ence Lines Scaling 


Rec- Differ- Rec- Differ- Rec- 
Sector tan- ence Sector tan- ence Sector  tan- ence 
Initials Scope gular (Illu- Scope’ _ gular (Illu- Scope _ gular (Illu- 
of (Experi- (Con- sory (Experi- (Con- sory (Experi- (Con- sory 
Subjects mental) trol) Trend) mental) trol) Trend) mental) trol) Trend) 





+33.3  -—31.2 +64.5 +1.3 +7.4 —6.1 —2.4 +8.0 —10.4 
+55.5 —-6 +466.1 +8.3 —-16.7 +25.0 —2.2 +5.9 —8.1 

+52.8 —2.9 +65.7 +39.3 +8 +38.6 —7.4 —3.3 —4.1 

+52.8 —-2.0 +454.8 +3.3 —4.3 +7.6 +168 -17.2 +34.0 

+49.8 -8 +60.6 +.5 —2.3 +2.8 +6.7 ‘ +18.3 

+65.0 +146 +460.4 +6.9 +2.6 +4.3 3.6 3. —.1 

+36.5 +.1 +36.4 +6.4 —-§.8 +12.2 6 ‘ +.1 

26.1 —-54 +31.6 +5.6 —3.1 +8.7 5. 2.4 —2.8 

+17.2 -11.1 +28.8 —3.1 —.9 —2.2 2. , —4.1 

+25.5 +13 +24.2 -.1 +.2 —.3 . : +2.2 

+18.2 —5.2 +23.4 +3.1 —1.5 +4.6 2. ‘ —6.2 

+14.8 —6.0 20.8 +6.9 —1.3 +8.2 : 2. —1.9 

S. K. M. +23.7 +3.1 +20.6 +3.1 —20.7 +23.8 ‘ ‘ +2.4 
M.:F se +16.9 —3.7 +20.6 +.7 +.5 +.2 3. 8.{ +42.2 
C.&. 8 +314 412.9 +18.5 —6.1 —9.9 +3.8 ‘ : +5.6 
A, Hi. F. +11.4 —5.0 +16.4 —1l.1 —2.6 +1.5 . p —2.3 
C. E. F. +23.3 +7.5 +158 +23.3 -103 433.6 f f —2.8 
M.S. W. +5.8 +2.2 +3.6 +1.00 -16.0 +17.0 f “ +9.4 
DA. WW. -—4 +139 -—14.3 +.5 —3.0 +3.5 ; : +20.1 





Note: Difference values expressed in italics are better than the one per cent level of 
significance. Values are averages based on from 14 to 28 runs. 


differences. Maximum susceptibility reached 65% overestimation at 
the apex of the sector sweep. 

4. Untrained Subjects on Reproductions of Field Scopes. Six new 
and untrained subjects were now tried on reproductions of radar field 
runs, using the ultra-violet projection radar simulator. Three are 
strongly susceptible (see Figure 7 and Table 5). One is suspected of 
being moderately susceptible. Two showed no reliable differences. 
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Table 5 


The Perspective Illusion in Untrained Subjects 
Percentages of Error in Elevation Judgments 


Individual Experiments, Reproductions from Field Scopes 
Plus signs indicate over estimations; minus signs, under estimations. 








Unscaled 100-ft. Refer- Multiple 
Scope ence Lines Scaling 





Differ- Differ- Differ- 
ence ence ence 
Initials Apex (Illu- Apex (Illu- Apex (Illu- 
o of Open sory of Open sory of Open sory 
Subjects Sector End Trend) Sector End Trend) Sector End _ Trend) 





J.H.J. +214 -234 +4448 
aie +44 -—219 +2683 —-10 —139 +129 42.7 -—-146 +417.8 
+46 -—17.1 +21.7 -5.8 —5.7 -.1 
—6.3 —13.8 +7.5 +8.7 -—22.7 +314 -—6.9 +.1 —7.0 
—-7.0 —13.3 +63 —3.0 —5.5 +2.5 —-6 —1.6 +1.0 
—145 -28 -11.7 -2.7 -—9.9 +72 -65 +70 -—13.5 





Note: Difference values expressed in italics are better than the one per cent level of 
significance. Values are averages based on from 17 to 48 runs. 


5. Trained Subjects on Reproductions of Field Scopes. There were 
two subjects who had started with the program, 18 months before. These 
are considered as trained subjects. The data are presented in Table 6. 
Neither has a reliable difference between the apex and the open end of 
the scope, either for unscaled or scaled scopes. One subject, L. A. A., 
had been mildly susceptible in the earlier stages (see Table 3). The 


Table 6 


The Perspective Illusion in Trained Subjects 


Percentages of Error in Elevation Judgments 
Individual Experiments, Reproductions from Field Scopes 


Plus signs indicate over estimations; minus signs, under estimations. 








Unscaled 100-ft. Refer- Multiple 
Scope ence Lines Scaling 


Differ- Differ- Differ- 
ence ence ence 
Initials Apex (Illu- Apex (Illu- Apex (Illu- 
of of Open sory of Open sory of Open _ sory 
Subjects Sector End Trend) Sector End Trend) Sector End _ Trend) 





L. A. A. —-78 -90 +12 -60 42.7 -87 -22 439 -6.1 
D. M.S. -55 -15 -40 -56 -28 -28 -2.9 —2.7 —.3 





Note: All differences have a significance poorer than the one per cent level. 
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other subject, D. M. S., had never been susceptible, even on the artificial 
scopes. There seems to be evidence that, for some subjects, training 
either decreases the illusory effect, or even eliminates it. 

6. The Kéhler-Wallach Principle. In dealing with the perspective 
illusion Kéhler and Wallach (2) stated that space judgments in a triangu- 
lar area showed overestimation at the apex and also underestimation at 
the open end, whereas many of the textbooks stress only the overestima- 
tion at the point of the triangle. The Kéhler-Wallach principle is well 
illustrated among the first three subjects of Table 5. In the artificial 
scope series the principle was still there, though this has been omitted 
in the form of our tabulation, but it was much milder, with only slight 
tendencies toward underestimation at the open end. 


Summary 


1. When the position of signals on the area of a sector-type scope 
reaches the apex of the scan-line sweep, in what is essentially a triangular 
area, overestimations of space reach as much as 65% for some subjects. 

2. The great majority of all subjects show some degree of suscepti- 
bility, but the range of individual differences extends from complete lack 
of proneness to about 65% relative overestimation at the apex. Within 
the limitations of the number of subjects used, it might be expected that 
90% of all subjects show some degree of the illusion. 

3. The design of the field of a radar scope must take into consideration 
the shape of the field for its total effect on scope reading errors. With 
respect to this illusion, clearly visible multiple scaling will reduce the 
space distortion, but judgment must be deferred lest other types of 
errors are introduced, and these will be presented in later articles. 
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Types of Errors in Location Judgments on Scaled Surfaces. 
II. Random and Systematic Errors * 


Adelbert Ford 
Department of Psychology, Lehigh University 


A large variety of instruments require operators to report the position 
of a “signal,’”’ such as a white spot, by reading its position with reference 
to superimposed scaling lines. In dealing with types of radar associated 
with the navigation of aircraft a single large error could cause loss of life 
and the destruction of expensive equipment. 

In the last article’ we noted the existence of errors caused by the 
shape of the field surface. In the present article, using the same scaling 
and problem sequences, we propose to show: (1) the size of the random 
errors caused by the limiting effects of interpolating scale values of specific 
scales, and (2) certain systematic errors consisting in particular of the 
confusion error, defined as a mistaken interpretation of ‘the numerical 
value of the scale points, and what we shall call persistence errors, defined 
as a proneness of some subjects to bias reports in a sequential series by 
memory effects of the previous reports. 

Although the present report is specifically concerned with position 
reporting from scaled areas, it will probably be instantly perceived that 
some of the principles are perhaps equally applicable to linear scales. 
The consequences of this error analysis are much more basic than the 
narrow application to radar scopes. 


Fineness of Scaling and Random Error? 


As illustrated in the previous article, there were three types of scaling 
used for these experiments: (1) a scope with a zero line of reference across 
the field, but no other scaling assistance other than a sample scale printed 


*This research was executed under Contract No. W28-099-ac-130 between the 
Institute of Research, Lehigh University, and the USAF Air Materiel Command, 
Watson Laboratories, Red Bank, N. J. The investigation was made to ascertain the 
accuracy of radar operators in the interpretation of scope signals. 

1Ford, A. Types of errors in location judgments on scaled surfaces: Errors of con- 
figuration. This Journal, Vol. 33, August, 1949. 

2 Readers who possess a cleared status for restricted reports will find a more elab- 
orated description of the tables and calculations in: A. Ford and M. H. Getz, Types of 
Errors in the Reading of GCA Scaled Scopes, Technical Report No. 4, Contract W28- 
099-ac-130, Watson Laboratories, Air Materiel) Command, USAF, 31 August 1948. 
Restricted. 

382 





Types of Errors in Location Judgments on Scaled Surfaces. II 383 


on the side of the scope for comparison; (2) a scope with a so-called ‘‘100- 
foot Reference Line’’ located parallel to and 0.4-inch away from the zero 
line of reference; and (3) a scope with a multiple system of parallel lines, 
separated by tenths of an inch, each line representing 25 scaled feet. 

For practical reasons, the errors were all reduced to percentage values 
in this section of the data, using only pips which were 50 or more scaled 
feet from the zero line of reference. Figures 1 to 3 are based on the 
composite records of five subjects. (It will be shown later that individual 
differences in random error are small.) 

Figure 1 shows that, for the unscaled scope, the standard error was 
11.09% of the space being estimated. Figure 2 shows that the use of 
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Fig. 2. Distribution of errors on the scope with 100-ft. reference lines. 
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Fig. 3. Distribution of errors on the scope with multiple scaling. 


side lines, 0.4-inch away from the zero line of reference, reduced the 
standard error to 8.48%. Figure 3 shows that with the use of a multiple 
system of lines, one tenth inch apart, the standard error is now reduced 
to 4.59%. 

Now Garner (1) has shown that on PPI-type scopes, with scaling 
in the form of concentric rings, scaling of the degree of fineness in our 
multiple system produced confusion errors, decreased accuracy and 
promoted longer reaction times. We shall substantiate Garner’s state- 
ment with respect to confusion errors, but we shall have to indicate, 
from evidence in Figures 1 to 3, that the smallest spread of random error 
was produced for the finer scaling. We found no statistically reliable 
difference in verbal reporting reaction time. This may be a difference 
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between human reactions on polar scaling, which Garner used, and 
rectangular scaling, which we used. 

At this stage in the experiments we went into a more detailed gathering 
of data on the finely scaled scopes, to see whether or not the advantage 
of a smaller random error was not offset by the presence of systematic 
errors which could not be tolerated. 


Absolute Amount of Random Error 


Since we have ascertained that the more finely scaled scope yielded 
the smallest random error, in percentage figures, we shall now confine our 
measurements to the absolute values in this scaling situation (lines in 
tenths of an inch, representing 25 scaled feet of elevation, with 100-foot 
lines emphasized). 

In Tables 1 and 2 the standard deviation of the error spread is com- 
puted omitting the confusion errors around the 100-foot scaling line, 
which are obviously not random. Mistaken numerical interpretations 
around the 25-foot scaling line cannot be distinguished easily from random 
errors, but we shall make an attempt, later, to show they exist by statis- 
tical analysis. 

Individual differences, for untrained subjects on group experiments, 
with clear, uniform signals, are presented in Table 1. It appears safe to 


Table 1 
Random Error, Standard Deviation, Group Experiments, Individual Differences 


In the following table the subjects are arranged in the order of best to worst, and all 


are untrained. The scaling consists of the multiple system with lines a tenth of an inch 
apart. 








Stand. Stand. Stand. Stand. 
Dev. Dev. Dev. Dev. 
in in Number in in Number 
Scaled Scope of Scaled Scope of 
Feet Inches Readings j Feet Inches Readings 


2.8 O11 89 
4.1 .016 89 
4.5 .018 114 
4.5 .018 90 
4.5 .018 89 
4.6 .018 88 
4.6 .018 64 
4.7 019 89 
4.8 019 63 
4.9 .019 90 
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4.9 019 89 
5.2 021 87 
5.5 .022 90 
5.6 .022 88 
5.8 .023 86 
6.0 024 110 
6.1 .024 88 
6.5 026 88 
7.0 .028 87 
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Note: Confusion errors at the 25-foot minor scaling line cannot be accurately sepa- 
rated from random errors. The above standard errors include these, and are probably 
all too large. See Table 3 for an attempt at separation. 
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say, from these data, that average intelligent operators should be able to 
report elevation deflections to a standard error of a plus-or-minus 0.020 
inch of scope distance, under such conditions. This represents an error 
in judging the elevation of a plane of five or six feet, presumably trivial. 
Trained subjects are much more nearly alike in error spread, and we 
have combined the runs in Table 2 to show the absolute error under six 
different experimental conditions. 


Table 2 


Distributions of Errors under Various Conditions, Elevation Reporting, 
Multiple Scaling, All Subjects Combined 


For a description of the character of each run, as designated by A, B, C, D, E, and F, 
see page 387 of the text. 








Error, Character of Run 
Scaled 


Feet (A) (B) (C) (D) (£) (PF) Location of Types of Errors 








+110 Approximate band of confusion errors 
+105 around the 100-foot major scaling 


+100 line. Errors of overestimation. 
+95 


+90 


+85 Approximate band of confusion errors 
+80 around the 75-foot minor scaling 


+75 line. Errors of overestimation. 
+70 


Approximate band of confusion errors 
around the 50-foot minor scaling 
line. Errors of overestimation. 


Approximate band of confusion errors 
around the 25-foot minor scaling 
line. Errors of oversetimation. 


Central band of random errors. 








Types of Errors in Location Judgments on Scaled Surfaces. II 387 


Table 2 (Continued) 








Character of Run 





(A) (B) (C) (D) () (}® Location of Types of Errors 





2 4 3 32 #77 ~~ Approximate band of confusion errors 
3 3 19 27 around the 25-foot minor scaling 


1 3 13 line. Errors of underestimation. 
1 2 6 


4 Approximate band of confusion errors 
around the 50-foot minor scaling 
1 line. Errors of underestimation. 


Approximate band of confusion errors 
around the 75-foot minor scaling 
line. Errors of underestimation. 


Approximate band of confusion errors 
—95 around the 100-foot major scaling 


—100 line. Errors of underestimation. 
—105 


8.D. 
Feet 


8.D. 
Inches 





The six conditions in Table 2 are as follows: 


Condition A. Five trained subjects. Individual experiments. Arti- 
ficial scope with clear uniform signals. Rectangular presentation. 
Single-task elevation reporting. 

Condition B. Nineteen untrained subjects. Group experiments 
before a large screen. Same problem materials as Condition A. Rec- 
tangular display. Single-task elevation reporting. 

Condition C. Five trained subjects. Individual experiments. Arti- 
ficial scope with clear uniform signals. Sector presentation. Single- 
task reporting. 

Condition D. Nineteen untrained subjects. Group experiments 
before a large screen. Same problem materials as in Condition C. 
Sector display. Single-task reporting. 

Condition E. Six trained subjects. Individual experiments. Simu- 
lator reproductions of field radar. Typical pip variations in contour, 
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size, brightness, shape, and hazy edges. Sector display. Single-task 
reporting. 

Condition F. Six trained subjects. Individual experiments. Simu- 
lator reproductions of field radar. Same problem materials as Condition 
E. Sector display. Double-task reporting, alternating elevation re- 
ports with range reports. 

The standard deviation of error distributions appears at the base of 
each colunm in Table 2, expressed both in scaled feet and in inches of 
actual scope distance. 

Conditions A, B, C, and D all involve artificial scope pictures with 
clear, uniform signals. The conclusion that an average operator should 
be able to interpret distances, under these conditions, to a standard 
error of a plus-or-minus 0.020 inch is again substantiated. If a radar 
scope could be designed with such clear and uniform pips, and using 
scaling of this degree of fineness, this gives the human expectancy. 

Condition E, using reproductions of an actual radar scope, shows 
that the random error is about doubled, due to signals which vary in 
shape, size, intensity, haziness of edges, etc. In the artificial series the 
reports were ten seconds apart. In this simulator series the operator 
reported every tenth pip, with the scan-line crossing the scope once 
every second. Rate of reporting was approximately the same, therefore. 

Condition F is just like Condition E, except that the operator had to 
keep his attention on two tasks in alternation, elevation reporting and 
range reporting. The increase in standard error, from 9.8 feet to 13.3 
feet, represents the effect of giving an operator an additional task. It 
may be presumed that the more tasks the radar operator is required to 
do simultaneously the less accurate he will be on each. This conclusion 
may seem to be something like proving the obvious, but it must be 
remembered that there is a proposal to make one man do what was 
previously done by from 3 to 5 men on GCA radar installations. The 
need for one-man operation is urgent, and the present study is merely an 
attempt to show that multiple tasks must be accompanied by extreme 
work simplification, if we are to avoid intolerable reporting errors. One 
confusion error, of the amount shown in Table 2 at the 100-foot line, 
could wreck an air transport. 

Figure 6 shows the fit of a normal curve of distribution to the actual 
error distribution for the data of Condition E, reproductions of actual 
radar scopes. 


Confusion Errors 


Scales, both linear and surface types, consisting of major lines with 
numerical values, and minor divisions which are supposed to assist in 
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OBSERVED p NORMAL CURVE — 
FREQUENCY >,’ SAME AREA AND 
/ STANDARD 
DEVIATION 


“35 -30-25 -20-I5 -lIO -5 0 5 10 15 20 25 30 35 
NEGATIVE ERRORS. POSITIVE ERRORS 
SCALED FEET—1 FT.=.004 SCOPE INCHES 


Fia. 4. Type of fit for normal curve when errors at 25 ft, the position of 
a minor scale division, have been included. 





interpolation, are subject to mistaken interpretation of figures and errors 
in counting division points. 

Table 2 shows a clear existence of mistaken interpretation at the 
100-foot value. This is verified by subjective reports, many times. The 
100-foot line is called a 200-foot line, or the line of zero reference is mis- 
taken for a 100-foot side line. There was no case of an error as great as 
200 feet, but it was theoretically possible. 

Also, at the 75-foot, the 50-foot, and the 25-foot distances there is an 
equal probability of assigning wrong numerical interpretations. These 


OBSERVED FREQUENCY NORMAL CURVE 
FITTED TO CENTER 
FIVE STEPS 


CONFUSION ERRORS CONFUSION ERRORS 
NEAR -25 FT. NEAR +25 FT. 
SCALING LINE SCALING LINE 





35 30 -25 -20-15 “10-5 0 5 10 15 20 25 30 35 
NEGATIVE ERRORS POSITIVE ERRORS 
SCALED FEET— | FT.=.004 SCOPE INCHES 


Fic. 5. Hypothetical improvement of normal curve fit when errors at the 25 foot 
scaling position have been excluded. Presented to explain the x? improvement shown 
in Table 3. 
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are fairly clear at 50 feet and up. Unfortunately the confusion errors 
at the value of 25 feet overlap with the curve of random error. In fact, 
there is no way of separating confusion from random errors, at this 
position, but there may be a statistical way of showing facts which 
support the belief that they must be there. 

Assuming that random error distributions should approach the curve 
of normal probability, an hypothesis which has considerable support, and 
that systematic errors will cause typical and expected distortions from 
normalcy, we may resort to the x? test for these data. And in this use of 
the Fisher technique, it isn’t just the bald fact that a misfit has occurred, 
but where in the curve the misfit is found, whether or not it is over the 
values which correspond to the minor or major scale points, that should 


OBSERVED NORMAL 
FREQUENCY ——> PROBABILITY CURVE- 
EQUAL AREA AND 
STANDARD DEVI- 
ATION 

















45 -40'-35 '-30'-25 -20°-15"-10" -5° 0° 5° 10° 15 *20'25'30'35'40'45' 
ERROR IN SCALED FEET 


Fia. 6. Normal curve fit—randomerrors. Elevation reports in double task experiments. 
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prove of interest in spying out the presence of confusion errors mixed 
with random errors at the 25-foot distance. 

Figure 4 shows the typical result we get when we try to fit a normal 
curve on our error distributions. The normal curve is plotted using the 
standard deviation of the distance from —35 to +35 feet, which includes 
confusion errors around 25 feet. 

The x? test always resulted in too many errors over the 25-foot position, 
and the discrepancy was always positive for every distribution beginning 
with Condition A through and including Condition E. This always 
produced the appearance of a leptokurtic hump at the center. 


Table 3 


Artificial Scope Runs 
x? Tests of Curve Fit for a Normal Distribution of Error, 
Central Band of Random Error 

In the following table the errors from —15 feet to +15 feet are hypothetically 
considered as being the band for pure random error (see Table 2), and that this region 
should fit a normal probability curve. The fit to the central band is tried two ways: 
with the supposed confusion errors included, and with the confusion errors excluded, 
i.e., by computing the standard deviation only on the central band. 








Curve Stand. Stand. Number of 
Area of Dev. x? Fit Dev. x? Fit Readings, 
Condi- Central —35 to Central —15to Central Central 
tion Band +35 feet Band +15 feet Band Band 





(A) 98.5% 4.31 167.25 3.40 80.19 1499 
(B) 98.8% 5.10 24.28 4.65 5.89 1588 
(C) 97.7% 5.01 225.97 3.70 48.67 1613 
(D) 97.7% 5.30 97.01 4.40 9.92 1638 
(E) 81.8% 9.80 39.79 8.50 23.27 1280 
(F) 13.33 1.78 — —_— — 





Note: The central band for Condition F was from —35 to +35, and was divided into 
five step intervals. No attempt was made to improve the fit because this was already 
as good as could be obtained. The area of this central band was 99.5% of the total 
distribution. 


Figure 5 shows our hypothesis of what would happen if we deter- 
mined the standard deviation by the central band of random error, only, 
and deliberately assumed that the excess of readings over the 25-foot 
point is due to confusion errors, not random errors. 

Therefore, we adjusted the standard deviation value to fit the central 
band of error values, from —15 to +15 feet, and applied the x? test again. 
The differences between the two assumptions are shown in final x? 
answers in Table 3. Without exception, for the artificial scope series 
A to D inclusive, a x? fit for 98% of the readings was greatly improved. 
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The astonishing thing was the discovery that the distribution for Condi- 
tion F, double task reporting, was already almost a perfect normal curve, 
and could not be improved by any assumptions of systematic distortion. 

We are inclined to believe, therefore, that the approximate bands 
for the regions of confusion errors in Table 2 are essentially correct. This 
means that, in reducing the random error by more finely divided scales, 
we have introduced an intolerable numerical confusion error, extremely 
dangerous for the practical navigation of aircraft by ground control radar. 
Therefore, no recommendation is made to use such a scale. More simpli- 
fied methods of signal tracking must be designed, especially for one-man 
operation. 


Persistence Errors 


A rather broad definition of a persistence error may be: It is the 
tendency of an operator to bias a present report because of the mental 
persistence of a previous report. 

We uncovered the existence of this possibility through two subjects 
whose data are plotted in Table 4. The first evidence was a sort of verbal 
stereotyping occurring when operators had to attend to two things 
alternately. Table 4 is drawn from the double-task experiments of 
Condition F. 


Table 4 
Distributions of Errors 
Range Reports 
Reproductions of GCA Field Radar Scope 








Error 

in Initials of Subjects 
Scaled 
Miles R.C. D.M. D.M.S. J.H.F. W.A.8S. C.B. Total 








1 1 Band of 
persistence 
errors 


Band of ran- 
dom and con- 
fusion errors 
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“c“ 


An operator would be reporting consecutive range values, ‘‘six-point- 
two, six-point-one, six-point-zero,’”’ and when he passed into the five- 
mile zone he went on, “six-point-nine, six-point-eight,’”’ and then suddenly 
remarked, “Oh, I meant five-point-eight.”” This is essentially the situa- 
tion for Table 4. 

This led us to wonder if something similar to this might not have been 
happening, to susceptible subjects, in the previous elevation serial re- 
porting. Therefore, we computed the algebraic mean of errors following 


Table 5 


Trend of Algebraic Mean Error in Relation to Previous Report 
Elevation Scale 


A plus sign means that the subject tended to veer his reports in the direction of the 
preceding report. A minus sign means that the subject tended to bias away from the 
preceding report. The calculation is the difference in means where the preceding report 
was higher as compared with readings where the previous report was lower. Figures in 
italics are better than the one per cent level of significance. Differences are in scaled feet. 











1. Group Experiments, Artificial Scope 





Subject Difference Subject Difference Subject Difference 





M. B.C. +2.38 W. J. K. +.32 N. J. R. +.79 

A.H.F. — .24 5. K. +.18 R. K. 8. +2.05 

C.E. F. +.45 . M. +1.56 R.B.T. +1.91 

D. L. H. +.88 » WW. ae — 61 C. A.W. — 1.37 

J.d. +1.18 . Bee — .28 M.S. W. +.21 
+1.07 





2. Individual Experiments, Artificial Scope 





+1.05 F.P. Hi. —.31 D. M.S. 
+1.02 R. J. R. +1.01 





3. Simulator Reproductions of Field Radar 





+5.30 J.H.F. +3.07 W.A.S. 
— 2.24 D. M.S. +.80 





larger previous values, and subtracted this from the algebraic means of 
reports following smaller previous reports. This difference is susceptible 
to a calculation for reliability of differences of means. ‘Table 5 shows the 
results of this survey. Although only six out of twenty-six subjects 
showed a significance of difference better than the one per cent level, 
the general preponderance of plus values (20 out of 26) may carry some 
weight. 

Granting that some subjects are susceptible to this effect, the size of 
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the error trend is actually too small to be of any serious consequence for 
the practical control of aircraft. A biasing effect of two feet, or even 
five feet, would not be intolerable. On range reporting it is conceivable 
that a mistake of one mile might be serious. 


Summary 


1. The use of finer scaling, with minor scale division to tenths of an 
inch viewed at sixteen inches, reduces random errors to a standard devia- 
tion of a plus-or-minus 0.020 inches of scope distances, for clear uniform 
pips, and 0.040 inches of scope distance for reproductions of actual 
radar pips. 

2. The introduction of this finer scaling produces a proneness for 
confusion errors, defined as misinterpretation of the numerical values of 
scale positions. These errors may reach such a size as to endanger the 
navigation of an aircraft being guided by such operating reports. 

3. Requiring an operator to alternate between two tasks in rapid 
succession has the effect of increasing the size of the random error, in our 
situation, about 30%. 

4. Some subjects have a tendency to bias each report in a series by 
the mental persistence of the previous report. Only a minority of 
subjects do this consistently, and the amount is relatively small for 
practical significance. 


5. Fine scaling, for one or more variables, is not recommended on the 
basis of present data for radar scopes. 
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Some Design Factors in Making Settings on a Linear Scale * 


William Leroy Jenkins and Minna B. Connor 
Lehigh University 


In setting a pointer on a linear scale by means of a control knob, is 
there an optimal ratio between pointer movement and knob turn? Is 
there an optimal knob diameter? When is a crank handle better than a 
knob? What is the effect of backlash in the system? No previous system- 
atic investigation of such design factors seems to have been made. 

The present study deals with a situation in which the operator is 
required to match a designated position on the scale with his pointer, 
rather than to set it to a specified numerical value. This limited phase 
of the problem was chosen because it permits data to be gathered rapidly 
and allows the accuracy of the setting to be objectively checked. 

The primary criterion employed is the time consumed in making a 
setting, since time is comparable from subject to subject, and from condi- 
tion to condition. In many of the experiments, action potentials from 
the active forearm were also picked up and measured. However, action 
potentials cannot be compared from subject to subject, since it is not 
known that the efficiency of the pick-up is the same in all subjects. 
For any given subject they do provide at least a rough indication of the 
relative amount of muscular work involved under different conditions. 


Apparatus 


Figure 1 is an operational diagram of the essential mechanical features 
of the apparatus. Rotation of the control knob turns the lower set of 
cone pulleys which drives the upper set of cone pulleys through a belt. 
Different ratios are obtained by shifting the belt. When the clutch is 
engaged, rotation of the upper shaft turns the drum and thus moves the 
pointer. When the clutch is disengaged, movement of the knob does 
not affect the pointer. 

The linear scale consists of a black bakelite bar with vertical inserts 
of lucite .032” wide at distances of 3, 9, 15, 21, 27, 33, 40, 56, 72, and 88 
sixteenths of an inch symmetrically from the center. Behind each 
insert is a tiny flashlight bulb. 


* This research was executed under Contract No. W28-099-ac-130 between the Insti- 
tute of Research, Lehigh University, and the USAF Air Material Command, Watson 
Laboratories, Red Bank, N. J. 
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Through the center of the linear scale runs a thin metal strip which 
is used in checking the accuracy of setting. The pointer can be tipped 
to come in contact with the scale. If the pointer is entirely within the 
limits of a lucite insert, it will not touch the metal strip. If it is off the 
insert either way, it will come in contact with the metal strip and cause 
a red pilot lamp to light. The limit of error-tolerance is thus established 
by the width of the pointer. 

The mechanical system is without backlash and is so adjusted that 
the pointer remains exactly where it was set after the clutch is released. 
To maintain these conditions, the belts must be quite tight; so that there 
is noticeable resistance at extremely coarse ratios. With the mechanical 
advantage of ratios in the medium and finer ranges, however, the opera- 
tion requires very little effort. 


LINEAR SCALE 
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Fic. 1. Mechanica] features—operational diagram. 


For measuring time, two chronoscopes are used; so that time for 
travel to approximate location and time for making the final adjustment 
can be separately determined. Similarly, two condensers are used to 
accumulate amplified action potentials during the travel and adjust 
phases. (Details of the electrical circuits and the four-stage amplifier 
will be found in the Technical Summary Report of the project.)! 

1 Jenkins, W. L. and Connor, M. B. Optimal Factors for Making a Setting on a 


Linear Scale, Technical Report No. 3, Contract W28-099-ac-130, Watson Laboratories, 
Air Material Command, USAF, 30 June 1948. Restricted. 
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Procedure 


The procedure was essentially the same for all experiments. During 
a typical two-hour experimental session six or seven runs can be com- 
pleted. Each run consists of a series of 20 settings, involving all 20 of the 
lucite inserts in a scrambled order. The procedure for a single setting 
is as follows: 


1. After giving a preliminary warning signal, the experimenter closes 
a switch which simultaneously: (a) lights a pre-selected lucite insert; 
(b) starts both chronoscopes; (c) begins the accumulation of amplified 
action potentials in the first condenser. 

2. As soon as he sees an insert light up, the subject starts turning 
the knob to bring the pointer from the center of the scale to the designated 
position. When the pointer comes within one tenth of an inch of the 
lighted insert, a contact is automatically closed which simultaneously: 
(a) stops one chronoscope; (b) switches the accumulation of action 
potentials from the first to the second condenser. Thus the first chrono- 
scope and the first condenser provide measurement of the TRAVEL 
time and potential. 

3. When the subject has completed his setting, he pushes the clutch 
release with his non-operating hand. This action simultaneously: (a) 
stops the second chronoscope; (b) cuts the second condenser out of the 
circuit. Thus the second chronoscope and the second condenser provide 
the ADJUST measurements. 

4. The experimenter checks the accuracy of the subject’s setting by 
tilting the pointer against the scale. (Errors occur so rarely that the 
very occasional “red light”’ reading is simply discarded.) The experi- 
menter records the readings of both chronoscopes, and discharges each 
condenser separately into a sensitive ballistic galvanometer. The appa- 
ratus can then be reset for another trial. 


Method of Analyzing Data 


The raw data are in the form of time readings in tenth-seconds and 
action potential readings in arbitrary meter-scale units, for the travel 
and for the adjust phases of each setting. The adjust readings cause 
no difficulty because they can be averaged directly. However, travel 
readings vary according to the distance of the insert from the center. 
Hence, travel readings are first plotted against distance traveled and a 
straight line fitted. (The slope of this line is actually the travel rate, and 
the y-intercept an estimate of the starting time or potential.) Then the 
mean travel time (or potential) is scaled off for two standard distances: 
10 sixteenths and 50 sixteenths of an inch. (The former is probably 
more representative of the usual amount of movement required in making 
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discrete adjustments.) Mean total time (or potential) = mean travel 
+ mean adjust. 


Subjects 


Two former Navy radar operators (DMS and HWQ) were used in 
all of the experiments. Two other young men (JDS and RFM) with 
no such prior experience were available only for certain parts of the study. 
These four subjects are right-handed. The young woman (JKD) used 
in the study is naturally left-handed but was required to make settings 


with her right hand. She also had had no particular mechanical back- 
ground. 


Table 1 
Influence of Ratio on Time and Potential 
Standard Conditions 








Mean Total Time 
10 Sixteenths Travel 50 Sixteenths Travel 


HWQ JKD RFM DMS HWQ_ JKD 











29.0* 24.0* — 75.6* 66.6* 53.6* 
24.1* 23.1* 35.1* 39.5* 42.9* 37.9* 
22.6* 22.4* 32.2 31.2* 35.4* 32.4* 
19.5 19.4 30.3 24.3 22.7 25.8 
21.6* 22.0* 29.1 YB ig 26.0* 25.6 
20.2 23.9* 35.4 23.6 24.6 27.9 
23.1* 26.7* 37.3* 23.5 27.5* 30.7* 
25.3* 28.1* 37.3* 26.6 28.9* 32.5* 
33.3* 37.2* 47.4* 34.4* 36.5* 42.4* 
— 65.8* — 57.9 — 73.0 





Mean Total Potential 
10 Sixteenths Travel 50 Sixteenths Travel 


DMS HWQ- JKD RFM DMS HWQ_ JKD 











24.3* 29.9* 26.9* — a1.1° 78.7* 57.3* 
16.8* 20.8 19.5 27.3* 41.6* 46.8* 36.7* 
15.3 19.5 19.0 22.1 28.5* 35.1* 29.4 
14.4 19.7 20.3 20.3 23.2 28.1 28.3 
17.1* 16.4 21.2 17.5 25.1 22.6 26.8 
16.5* 18.4 20.5 20.5 21.3 20.8 24.9 
18.1* 16.4 21.9 25.8* 22.1 21.8 27.5 
19.7* 18.4 22.6* 26.6 23.3 22.0 27.0 
24.9* 23.4* 29.5* 33.4 26.9* 25.0 

25.4* _ 38.3* = 28.1 — 





* Significantly different from ratio 1.18. 





Design Factors in Making Settings on a Linear Scale 


Standard Conditions 


The following conditions were standard in all experiments, unless 
specific exception is noted: 


Linear scale—At eye level and normal reading distance. 

Control knob—At waist level of seated subject; right-hand operation; 
234"" diameter knob. 

Error-tolerance—.007” (pointer width of .025’’) 

Ratios—Expressed in inches of pointer movement for one complete 
turn of the knob. 


Mean total time is expressed in tenth-seconds for 10 sixteenths or 50 
sixteenths travel distance. Mean total potential is expressed in meter- 
scale readings which have no absolute significance but are comparable 
for different conditions in the same subject. Each mean is based on 
a minimum of 80 readings. In tables showing italicized values, an 
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Fig. 2. Influence of ratio—standard conditions. 
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asterisk (*) indicates figures which differ significantly from the itiali- 
cized values, beyond the 1% level of confidence. 


Results 


Influence of Ratio. Is there an optimal ratio? Table 1 shows mean 
total time and mean total potential for ten ratios varying from .220 
to 33.6 inches of pointer movement for one complete turn of the control 
knob. Although the subjects differ in their general levels, it is evident 
that the optimum is in the neighborhood of 1.18 in terms of both time and 
potential. 

Figure 2 shows why the optimal ratio is in this region. For all 
subjects, travel time declines rapidly with increasing coarseness to about 
1.18; thereafter coarser ratios do not speed up travel materially. In the 
opposite fashion, adjusting time declines with decreasing coarseness of 
ratio to about 1.18; thereafter finer ratios do not aid in making the final 
adjustment. A ratio about 1.18 combines rapidity of travel with speed 
of final adjustment. 

For convenience in the remainder of this report we shall refer to 1.18 
as “‘the optimal ratio.”” This should not be interpreted too literally. 
Actually there is an optimal region which holds good for all the subjects 
tested. Well-practiced subjects can use coarser ratios without undue 


loss, but the ratio designated as optimal has proved satisfactory for 
novice and expert alike. 


Table 2 
Stability of the Optimal Ratio 


Standard conditions except that Feb. ’47 figures were obtained with a 2” diameter 
knob. 











Mean Total Time 


10 Sixteenths Travel 
Subject: DMS Subject HWQ 








Feb. Apr. May Oct. Mar. Ratio Feb. Apr. May Oct. Mar. 
"47 "47 "47 = °AT "48 "47 "47 = 47 "47 "48 





28.1 - 20.6 25.2 .220 29.0 33.9 29.0 
20.3 22.4 17.5 454 22.4 25.8 24.1 
18.3 20.7 18.0 .766 = 20.4 25.7 22.6 
16.9 18.4 16.3 y 118 20.3 23.7 = 19.5 
16.7 22.1 19.1 2.42 20.3 9 23.1 21.6 
18.4 22.7 19.2 4.08 20.7 25.3 20.2 
19.8 21.7 19.5 ; 24.4 33. 5 23.1 
21.8 — 23.8 9.70 25.7 25.3 
28.9 — 32.8 30.8 33.3 
— 654.3 50.6 
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An indication of the stability of the optimal ratio over a period of time 
is presented in Table 2, which shows data for two subjects gathered on 
five different occasions over a period of thirteen months. Although the 
level of performance fluctuates from time to time, the optimal ratio 
remains in the same region. 

Table 3 shows that the optimal ratio holds good for both the dominant 
and non-dominant hand. (To obtain these figures, a left-hand and a 
right-hand knob were coupled with auxiliary belts, so that the pointer 
could be set with either hand.) Particularly interesting here are the data 
for subject JKD. Although naturally left-handed, JKD had by this 
time become well-practiced in right-handed operation of the apparatus. 
At unfavorably high ratios she was now able to make faster settings with 
her right hand. Around the optimal ratio, the two hands were equally 
good. 


Table 3 
Ratios in Right vs. Left-Hand Operation 


Standard conditions except that identical right and left hand knobs were coupled 
by a belt so that either could be used. 





Mean Total Time 
10 Sixteenths Travel 
HWQ JKD 


Left Right Right Left 








.766 2. 24.4 25.5 25.0* 24.9 
1.18 ?1.: 24.6 24.8 22.5 23.6 
2.42 ‘ 24.3 24.7 24.7 22.4 
4.08 ; 25.0 25.1 27.6* 30.3* 
6.28 t 29.3* 29.8* 27.7* 33.7* 
9.70 9. 38.0* 37.6* 31.6* 36.4* 





* Significantly different from ratio 1.18. 


Influence of Knob Diameter. In a preliminary study on two sub- 
jects, fourteen knob diameters were tested with five different ratios. 
For clarity in presentation the fourteen diameters are grouped in five 
step intervals. Table 4 gives the mean total time for 10 sixteenths travel 
distance. Several points of interest appear: (1) Regardless of knob 
diameter, the optimal ratio remains in the neighborhood of 1.18. (2) It 
is apparently not possible to compensate for an unfavorable ratio by 
altering the size of the control knob. Notice that the fastest times for 
ratio 6.28 are longer than the slowest times for ratio 1.18. (3) With 
coarse ratios the larger knob diameters work better. (4) At the optimal 
ratio, knob diameter appears to make very little difference. 
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As a check on this last point, five knob diameters were studied at the 
optimal ratio, using four subjects. Table 5 shows the results for both 
time and potential. In terms of mean total time, only the half-inch 
diameter is clearly unfavorable for all subjects, and the one-inch diameter 
mildly so for two of them. In terms of action potential, the 234” 
diameter is significantly superior to the smaller sizes, although not always 
to the 4’’ diameter. 


Table 4 


Interaction of Knob Diameter and Ratio 


Standard conditions except that series of knob diameter were combined with series 
of ratios as indicated. 








Mean Total Time 
10 Sixteenths Travel 
Subject HWQ 


Knob Ratio Ratio 
Diameters 2.42 4.08 





e, 4 29.2 _ sie 
1,1%,1% 24.1 26.8 26.8 
134, 2, 2% 22.6 25.3 25.6 34.2 
214, 234, 3 23.6 27.0 25.7 33.0 
34, 3%, 4 24.3 27.3 25.0 . 30.7 





Subject DMS 


Knob Ratio Ratio Ratio Ratio 
Diameters 1.18 2.42 4.08 6 9.70 





a, 4 21.5 ~ — . was 
1,14, 1% 21.5 24.3 30.0 : 33.7 
134, 2, 2% 22.5 22.2 26.6 ; 28.3 
214, 2%, 3 21.6 22.5 26.3 y 29.6 
314, 344, 4 23.2 22.4 25.7 : 27.2 





Figure 3 shows travel time and adjusting time separately. The half- 
inch diameter yields longer times for both travel and adjusting in all 
subjects. Among the larger sizes there is little to choose. It appears 
that the critical motion is the twist of the forearm, not the movement of 
the finger tips. Practically speaking, as long as the optimal ratio is used, 
the exact knob diameter does not matter, unless it is too small or too 
large to be grasped conveniently. The standard 234” size used in most 
of our experiments was adopted simply because most subjects expressed 
a preference for this size. 

Influence of Crank Handle. Cranks are generally used in tracking 
operations. The question has been raised whether a crank is better than 
a knob for making discrete settings involving large amounts of travel. 





of 1.18. 
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Table 5 
Influence of Knob Diameter at Optimal Ratio 


Standard conditions except that series of knob diameters were combined with ratio 








Diam. 


10 Sixteenths Travel 


Mean Total Time 





DMS 


HWQ 


JKD 


RFM 


50 Sixteenths Travel 





DMS 


DWQ 


JKD 





25.3* 
23.1 
21.1 
21.9 
21.2 


28.1* 
23.0* 
22.9* 
20.8 
21.8 


26.3* 
22.0 
23.0 
22.1 
21.8 


42.1* 
39.3* 
35.2 
84.5 
37.6 


35.3* 
31.5 
30.2 
28.7 
27.6 


38.1* 
29.4 
29.3 
28.0 
29.4 


35.5* 
29.6 
29.4 
27.3 
26.6 





10 Sixteenths Travel 


Mean Total Potential 





DMS 


HWQ 


JKD 


RFM 


50 Sixteenths Travel 





DMS 


HWQ 


JKD 





31.4* 
30.9* 
26.0* 
23.4 
21.7 


29.2* 
24.2 
25.5* 
22.6 
26.7* 


38.0* 
33.0* 
27.6 
26.0 
24.9 


33.4* 
27.1* 
22.3* 
18.5 
13.6 


44.6* 
44.1* 
38.4* 
33.0 
31.7 


40.0* 
35.0 
36.3* 
33.8 
37.5 


50.8* 
44.2* 
39.2* 
36.8 
35.2 





* Significantly different from 234. 
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To study this problem the 234”’ knob was drilled so that a crank handle 
could be attached 44” from the periphery. Time measurements were 
taken at seven ratios under the following conditions: (1) Knob alone as 
a control; (2) crank attached and its use required; (3) crank attached 
but its use optional. 

Table 6 shows mean total time for 50 sixteenths travel distance, 
which should give the crank the maximum advantage. Two interesting 
points appear: (1) Although the crank speeds up setting at ratios below 
1.18, it does not enable these ratios to compete with the optimal ratio 
and the simple knob. (2) At the optimal ratio, the forced use of the 
crank is definitely deleterious and even its mere presence appears to 
hamper the best performance. Within the limitations of these experi- 
ments, at any rate, it appears that a crank handle serves no function 
whatever in making discrete settings on a linear scale. 


Table 6 
Comparison of Knob and Crank 
Standard conditions except each mean based on a minimum of 40 readings. Crank 
simulated by attaching crank-handle to periphery of 234’ knob. In the table: KNOB 
means knob alone; CRANK means use of crank required; OPT means crank-handle 
present but use optional. 











Mean Total Time 
50 Sixteenths Travel 
Subject: DMS Subject: HWQ Subject: JDS 


Ratio KNOBCRANK OPT KNOBCRANK OPT KNOBCRANK OPT 

















.220 81.2 52.6 54.8 73.5 58.5 55.1 103.6 50.1 52.8 


454 52.7 35.6 36.7 48.0 42.5 39.9 64.6 40.1 38.7 
.766 37.7 30.2 31.0 40.3 39.9 35.4 45.3 33.4 29.0 
1.18 25.6 32.7 32.5 30.6 38.6 31.7 29.0 34.3 32.7 
2.42 26.0 33.5 26.7 27.8 39.8 36.2 29.8 36.2 32.1 
4.08 26.8 45.8 29.6 30.0 45.6 84.0 29.0 44.0 32.0 
6.28 24.6 43.8 29.1 32.8 61.7 32.7 31.2 43.7 33.1 





Influence of Backlash. Backlash is unavoidably present in some 
equipment. What is its influence on the speed of making settings? To 
study this question, the apparatus was modified by the addition of an 
arm moving between adjustable stops immediately beyond the subject’s 
control knob, so that varying degrees of backlash could be introduced. 
In a preliminary series with two subjects, backlash was tested in 1° 
steps from 0° to 20° in the expectation that some particular amount of 
backlash might prove to be critical. Since this expectation was not 
realized, the figures have been grouped into seven step intervals. Table 
7 shows mean total time for 10 sixteenths travel at ratios 1.18 and 6.28. 
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Surprisingly, backlash appears to have very little effect, even at the 
unfavorably coarse ratio of 6.28. 

As a further check, backlash of 0°, 4°, 8°, 12°, and 16° was tested with 
three subjects using the optimal ratio. Results are given for mean total 
time and mean total potential in Table 8. Again it seems that no sub- 
stantial effect of backlash can be found in either time or action potential. 
There is a slight upward trend with increasing backlash, but the statis- 
tically significant differences are scattered spottily and unconvincingly 
throughout the table. Figure 4 indicates that the slight upward trend 
comes from a minor lengthening of adjusting time, while travel time 
remains unaffected. 


Table 7 
Interaction of Backlash and Ratio 


Standard conditions. Varying degrees of backlash introduced by means of an arm 
working between adjustable stops, immediately beyond subject’s control knob. 








Mean Total Time 
10 Sixteenths Travel 











Subject: DMS Subject: HWQ 
Backlash Ratio Ratio Ratio Ratio 
in Degrees 1.18 6.28 1.18 §.28 
0, 1, 2 23.1 27.8 24.4 29.2 
3, 4, 5 23.2 30.1 24.9 28.1 
6, 7, 8 23.8 32.5 25.8 28.7 
9, 10, 11 25.4 33.0 26.4 30.1 
12, 13, 14 25.1 32.7 26.4 32.2 
15, 16, 17 26.1 32.5 26.2 30.7 
18, 19, 20 26.5 33.3 26.6 29.7 





We are reluctant to draw the sweeping conclusion that backlash is 
totally unimportant under all conditions. Perhaps with excessive friction 
or inertia, perhaps when far greater accuracy than .007”’ is demanded, 
backlash may prove more disturbing than in the present experiments. 
Those are questions for further research to answer. 

Influence of Error-Tolerance. How much does it slow up an operator 
to demand greater accuracy in setting? In our apparatus the error- 
tolerance could be altered simply by changing the width of the pointer 
in relation to the width of the lucite inserts. In a preliminary series, 
eleven pointer-widths were tested. Table 9 shows the results in terms 
of mean total time for 10 sixteenths travel distance. At the optimal 
ratio, only subject DMS shows a marked lengthening of time with de- 
creasing tolerance; but at ratio 6.28 all three subjects show the same effect. 
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Table 8 
Influence of Backlash at Optimal Ratio 


Standard conditions. Varying degrees of backlash introduced by means of an arm 
working between adjustable stops immediately beyond subject’s control knob. 








Mean Total Time 























. 10 Sixteenths Travel 50 Sixteenths Travel 
Back- 
lash DMS HWQ JKD DMS HWQ JKD 
None 21.9 2.9 23.7 $1.1 30.9 $1.7 
4° 22.0 23.8 23.4 30.4 31.0 31.4 
8° 23.4 25.5* 26.6 32.6 33.5 34.6 
12° 24.2* 24.1 28.6* 34.2* 31.7 37.4* 
16° 26.8 24.5 26.6 36.4* 33.3 34.6 
Mean Total Potential 
. 10 Sixteeriths Travel 50 Sixteenths Travel 
Back- 
lash DMS HWQ JKD DMS HWQ JKD 
None 25.7 23.9 82.9 88.9 36.7 43.7 
4° 28.0* 24.2 31.2 40.8 36.2 42.0 
8° 26.4 26 6* 32.7 39.2 370 45.5 
i? 26.7 25.3 33.9 39.5 37.3 46.3 
16° 29.0* 26.5* 32.7 42.2 37.7 45.5 








ements ESRE SAINT 


* Significantly different from None. 
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Fig. 4.—Influence of backlash—standard conditions. 
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Table 9 


Interaction of Tolerance and Ratio 
Standard conditions except that knob diameter is 2”. 
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Mean Total Time 


10 Sixteenths Travel 








- Subject: DMS Subject: HWQ Subject: JDS 
orror —_—_—__—_— 
Tolerance Ratio Ratio Ratio Ratio Ratio Ratio 
1.18 6.28 1.18 6.28 1.18 6.28 

.018’’, .016” 17.0 17.8 19.5 21.7 _ -— 
Sis”, Bi” 16.5 21.0 20.7 23.2 22.1 27.9 
.009’’, .008’’ 18.2 24.4 22.7 27.7 22.7 30.1 
.007"", .006’’ 24.2 28.6 22.7 30.0 24.2 30.0 
.005” 26.5 37.1 24.1 29.5 29.9 33.4 
.004” 30.0 52.1 25.2 33.2 32.7 39.9 
.003” 35.3 50.2 29.0 39.2 33.9 40.5 





A further study was made with four subjects, using five tolerances 
at the optimal ratio, measuring both time and potential. 


the results. 


Table 10 gives 


There is evidence of a moderate lengthening of time from 


.012” to .005’’; then a sharp break at .003’”. From the reports of the 


subjects, it appears that .003” represents a breaking-point at which it 


Table 10 


Influence of Tolerance at Optimal Ratio 
Standard conditions except that series of error-tolerances were tested at ratio of 1.18. 








Mean Total Time 
10 Sixteenths Travel 





50 Sixteenths Travel 




















Toler. DMS HWQ JKD RFM DMS HWQ JKD RFM 
.012” 15.8* 19.0* 16.6* 27.9* 22.8 25.4* 23.0 38.3* 
.009 ‘7.1 19.5* 18.3 31.4 23.9 26.3 24.7 40.2 
.007 17.6 22.6 19.8 384.6 24.7 27.8 25.0 45.0 
.005 20.7* 23.4 21.8* 38.1 27.9 31.0* 27.4 48.9 
.003 27.7° 30.4* 25.9* 51.6* 33.3* 37.2° 32.3* 61.6* 
Mean Total Potential 
10 Sixteenths Travel 50 Sixteenths Travel 

Toler. DMS HWQ JKD RFM DMS HWQ JKD RFM 
O13" 14.1* 14.3* 21.4° 19.6 22.1* 33.7 30.6 30.0 
.009 14.9 15.7 22.0* 19.5 23.2 22.9 31.6 29.9 
.007 16.9 15.8 24.8 21.6 24.3 23.0 83.6 82.8 
.005 17.2 19.6* 23.4 23.4 25.2 27.2* 31.8 34.6 
003 21.5* 23.7* 27.6* 27.2° 29.0* 30.5* 36.4 37.6* 





* Significantly different from .007. 
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becomes perceptually impossible to judge whether the pointer is accu- 
rately positioned. This is borne out by the fact that only at this level of 
tolerance did the subjects have an appreciable number of “red lights”’ 
(indicating that the clutch was released when the pointer was not within 
the confines of the lucite insert). 

Figure 5 shows, as might be expected, that error-tolerance does not 
affect travel time. Adjusting time increases slowly as tolerance de- 
creases, with a sharp upward break at .003’’. 
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Fic. 5. Influence of tolerance—standard conditions. 


It should be realized that .003’’ represents a perceptual limit only 
under the conditions of this experiment; i.e., centering a pointer of 
appreciable thickness on a lighted insert. With ideal conditions, such 
as a fine hair line, it might be expected that the perceptual limit would 
be considerably lower. 


Summary 


In the foregoing experiments, the subject was required to move a 
pointer by means of a control knob and set it to a position on a linear scale 
indicated by a lighted insert. Time consumed in making the setting 
and the relative action potential developed in the active forearm were 
measured separately for travel to approximate location and for final 
adjustment. Systematic variations in ratio, knob diameter, backlash, 
etc., were introduced. Three to five subjects were used in the various 
parts of the study. The principal results follow: 
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1. The optimal ratio is one or two inches of pointer movement for one 
complete turn of the knob, for either the dominant or non-dominant 
hand. Finer ratios waste time and effort in traveling to the approximate 
location. Coarser ratios are clumsy for making the final adjustment. 
No other design factor investigated is as important as the optimal ratio. 

2. Knob diameter is relatively unimportant, as long as the knob is 
large enough to be grasped conveniently. An unfavorably coarse ratio 
cannot be compensated for by altering the size of the control knob. 

3. An unfavorably fine ratio cannot be compensated for by sub- 
stituting a crank handle for the control knob. When the optimal ratio 
is employed, the addition of a crank handle to the knob does not aid and 
may be actually harmful, even when its use is optional. 

4. Backlash, even in excessive amounts, has a relatively minor in- 
fluence on either time or potential at the optimal ratio—under the condi- 
tions of this experiment. This may not be true under conditions of ex- 
treme friction and inertia, or when a tolerance much finer than .007” is 
required. 

5. Demanding greater accuracy of the subject by reducing the per- 
mitted error-tolerance increases time and potential only moderately, as 
long as the optimal ratio is employed. The final limit of accuracy in the 
present experiments appeared to be set by the perceptual difficulty of 
centering a pointer of appreciable thickness on a lighted insert, rather 
than by the limits of motor control. 


Received April 18, 1949. 
Early publication. 








Book Reviews 


Lewin, Kurt (Edited by Gertrude Weiss Lewin). Resolving social con- 
flicts. Selected Papers on Group Dynamics. New York: Harper 
and Brothers, 1948. xviii-230. 

In his Foreward Gordon Allport writes such an excellent review of 
this book that the temptation to quote him liberally is too strong to be 
resisted. The thirteen papers, all previously published elsewhere are, 
he says, ‘‘so well-selected and so adroitly arranged that they provide an 
excellent introduction to Lewin’s system of thought” (p. XIV). “The 
unifying theme is unmistakable: the group to which an individual belongs 
is the ground for his perceptions, his feelings, and his actions. Most 
psychologists are so preoccupied with the salient features of the indi- 
vidual’s mental life that they are prone to forget it is the ground of the 
social group that gives to the individual his figured character. . . . This 
interdependence of the ground and the figured flow is inescapable, inti- 
mate, dynamic, but it is also elusive” (p. VII f.). 

“Lewin’s outstanding contribution is his demonstration that the 
interdependence of the individual and the group can be studied in better 
balance if we employ certain new concepts. Although the present volume 
contains primarily papers having a concrete, case-anchored character, 
still each shows with clarity how fruitful these new concepts are for under- 
standing the phenomenon in question” (p. VIII.). Here, I think, we 
must be more cautious than Allport. We do not, it is true, quite share 
the objection sometimes made that Lewin’s terminology is “meta- 
phorical.” All description consists in calling attention to similarities, 
and all terms are therefore metaphorical. What we should ask of the 
scientist proposing a new term is that he make clear the limits of gen- 
erality involved. Is psychological or “life space” like geometric space 
in every respect? Lewin says it has all the qualities ascribed to space in 
non-quantified geometry (i.e., in topology). Presumably the life space 
has some but not all the characteristics of the more-familiar Euclidean 
space. Thus the term will for a long time have for us a strongly analogical 
coloration; it will suggest, that is, some properties which it does not have. 

The merits of such a new way of describing facts must not, however, 
be overlooked. ‘Psychological or life space’ suggests parallels which 
are actually confirmable hypotheses. The volume of significant re- 
search which has been set in motion by Lewin’s array of terms is a tribute 
to their provisional utility. It is the reviewer’s belief that they will 
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greatly illuminate also many of the social psychological problems in 
industry. 

The more basic question comes when we consider the terms as ex- 
planatory concepts or constructs. Here fecundity in suggesting hy- 
potheses is not an adequate criterion. Nor can we accept Allport’s 
criterion of “‘understanding the phenomena in question.”’ It is rare for 
concepts to seem unworkable in the concrete situation to which their 
own author seeks to apply them. A construct must prove itself in terms 
of stability in systematically varying conditions. In social psychology it 
may be years before the constructs can be tested in the requisite variety 
of critical situations. 

Meanwhile, we do find here provocative interpretations of current 
problem situations. Part I deals chiefly with the problem of democratic 
re-education with particular reference to Germany. Part II deals with 
“Conflicts in Face-to-Face Groups.” Part III, dealing chiefly with 
minority group problems, is somewhat more miscellaneous. The last 
chapter is significant because it reveals Lewin right up to the moment of 
his untimely death striving to see how, through action research, his 
hypotheses could be put to a genuinely experimental test. All persons 
interested in social engineering will find stimulation in this book. 


Horace B. English 
Ohio State University 


Yoder, Dale, Paterson, Donald G., et al. Local labor market research. 
Minneapolis, Minnesota: University of Minnesota Press, 1948. Pp. 
xvii, 226. $3.50. 

Early in 1939 officials of the city of St. Paul, Minnesota became aware 
of an apparent paradox. Although employment had been restored to 
proportions equal to those of the predepression period relief loads and 
expenditures continued at the high levels typical of the depression years. 
A Mayor’s Committee on Unemployment studied the problem but found 
no satisfactory explanation. Finally the committee turned to social 
scientists at the University of Minnesota for help with the problem. In 
the early 30’s the Employment Stabilization Research Institute of the 
University had made a series of significant studies of employment and 
unemployment and was thus uniquely equipped in 1940 to attack the 
immediate problem facing the city of St. Paul. The story of the research 
efforts of the ESRI during the years 1940—42 is reported in Local Labor 
Market Research. 

The significance for psychologists of this account arises in part from 
the cooperative nature of the enterprise since the research staff included 
psychologists as well as economists, sociologists, and statisticians. Much 
of the methodology will interest applied psychologists in the fields of 
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opinion polling, counseling, and personnel administration. Finally, the 
findings, particularly of Project 3 constitute important new contributions 
to personnel psychology. 

After a one-year pilot study it was apparent that the research program 
should include a comprehensive study of the labor marketing process. 
Five projects were selected for study. 

Project 1 appraised available employment data particularly those of 
state and federal agencies and attempted to improve these labor market 
reports as a means of providing continuing indices of employment, hours, 
wage rates, and earnings. These measures were based on employer 
reports to various public and private agencies and covered only the 
employed. 

Project 2 sought to provide detailed information on the numbers and 
types of labor supplies available and to serve as a check on the data ob- 
tained in the first project. In addition, special studies were undertaken 
to obtain information regarding priorities unemployment, civilian morale, 
nature and extent of vocational training, housing, shopping habits, trans- 
portation, and migration. The method was a continuous sampling survey 
using both a panel and randomly selected respondents in St. Paul. The 
result is an impressive demonstration of the use of sampling techniques 
in maintaining a continuing check on the dynamic elements of the labor 
market and in providing basic information on a wide range of community 
problems. 

Project 3 concerned itself with some of the frictions in the labor market 
which interfere with the matching of men, women, and jobs. Psycho- 
logical tests, interviews, and attitude surveys were among the tools used 
in studying the human factor in employment. 

An attempt was made to relate available employment data to training 
opportunities available in the community. Data on school enrollments 
and the employment experiences of post-graduate youth were collected. 
The findings raised the question as to how well the public school system 
had fulfilled its responsibilities for vocational training. 

Opinion polling methods were used to identify and measure attitudes 
and attitude changes among various occupational groups. Findings 
indicated that members of the labor market often held opinions at 
variance with the facts and this doubtless accounted for some of the labor 
force frictions. It was possible to get some idea of the job satisfaction of 
various occupational groups through this polling approach. Questions 
regarding public policy such as “‘Do you think it is too easy for people to 
get on relief?”’ got at attitudes which indirectly affect employment policies. 

Of great interest to personnel psychologists is that portion of the 
study which compared the occupational classification assigned on the 
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basis of intensive clinical study to unemployed job seekers with those 
classifications routinely assigned by employment office interviewers. 
The results indicate that clinical study rather than a superficial appraisal 
based primarily on past job experience will identify a considerable 
number of persons whose potentialities for employment otherwise go 
undiscovered. 

This intensive clinical study of almost four hundred unemployed 
persons yielded other useful information. For example, it was found that 
counseling letters may be useful in a large-scale counseling program where 
time for interviews is at a premium. Other analyses gave information 
on the dominant causes of unemployment. 

A follow-up study of persons tested and studied clinically ten years 
previously indicated that occupational adjustment can be predicted with 
surprising accuracy. Re-tests on these same people gave amazingly 
high re-test correlations on pencil and paper tests being about .9. Cor- 
relations for performance tests were somewhat lower, being in the neigh- 
borhood of .6 to .7. 

Project 4 was an attempt to tease out some of the complex interrela- 
tionships which influence the demand for labor. Analyses of economic 
data and opinion surveys were the methods used. The latter sought to 
secure and classify employers’ opinions as to how and why they make 
decisions to offer employment. In a study of the printing industry the 
employees were also polled to ascertain any divergencies. 

Project 5 was an analysis of relief administration policies and practices 
on the assumption that factors other than those of the labor market might 
be responsible for the St. Paul paradox of increasing employment without 
an accompanying decrease in relief rolls. Analyses of official reports of 
secial work agencies provided one source of data. A major part of the 
study, however, was an intensive analysis of the characteristics of relief 
recipients. Finally, detailed study was made of fifteen relief clients for 
whom a great deal of information was available as a result of their partici- 
pation in the occupational analysis work of Project 3. 

The findings of Project 5, as a whole, indicated that the nature and 
conditions of relief administration were an important factor in the 
situation. It seemed clear that relief expenditures reflected much more 
than the current condition of local labor markets. 

Both the conduct of this research program and the nature and form 
of publication were materially affected by the war. Changes in personnel 
and finally the withdrawal of foundation support because of war condi- 
tions brought the study to an end before it really was completed. Thus 
this book is more of a progress report emphasizing methodology than it 
is a definitive statement of the findings. The compilation and publication 
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of this report actually was undertaken by the Industrial Relations Center 
established at the University of Minnesota in 1945. It is the work of 
many authors and reflects some of the obvious limitations. Credit for 
a careful editing should go, however, to Herbert G. Heneman, Jr. 

This is a unique and important contribution to labor market research 
and is a milestone marking the road which psychology is traveling 
toward cooperative research on meaningful problems. At a time when 
“action research” has become a fashionable term among social scientists 
the reviewer judges this report to be a significant demonstration of ihe 
application of psychological viewpoint and methodology to pressing 
social problems. This categorization as action research, be it noted, is 
expressed at the risk of embarrassing the directors of the enterprise who 
have long engaged in the study of real problems without benefit of a more 
esoteric terminology applied to their highly productive efforts. 


Arthur H. Brayfield 
University of California 


Jucius, Michael J. Personnel management. Chicago: Richard D. Irwin, 

Inc., 1948. xii + 696 pp. $6.00. 

Personnel management is defined as “the field of management which 
has to do with planning, organizing, and controlling the performance of 
various activities concerned with procuring, developing, maintaining, and 
utilizing a labor force such that the objectives and purposes for which the 
company is established are attained as effectively and economically as 
possible, and of labor itself are served to the highest possible degree.” 

Around this definition Jucius has written a college textbook designed 
to provide a “realistic study of the principles and practices of personnel 
management.” The thirty chapters deal systematically with organiza- 
tional problems, approaches, and techniques in selecting, training, re- 
numerating, and motivating employees and in maintaining satisfactory 
labor-management relations. 

The presentation is in the typical textbook fashion. It is well- 
organized and will lend itself to outlining in the student’s notebook. 
The emphasis seems to be upon presenting a body of material to be studied 
undtr the guidance of a qualified instructor rather than upon providing 
a self-motivating treatise for the general reader. It differs from the 
standard textbook, however, in that supporting source materials are 
rarely given. Footnote references are infrequent and there are no sug- 
gested additional readings for separate chapters. 

The chief merit of the text is its well-organized and systematic presen- 
tation of a wealth of information about personnel practices and principles. 
It is chuck-full of step-by-step procedures, examples of forms, and 
practical suggestions for approaching the common problems faced by a 
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personnel department. It emphasizes the importance of “getting the 
facts” and of careful follow-up and control after the appropriate steps 
have been taken. The final chapter stresses the need for a research point- 
of-view which would lead to continuous intensive study of all aspects of 
personnel management. 

Since this review is written by a psychologist for psychologists it is 
pertinent to look for evidence of the impact of psychological findings upon 
personnel practices as described, even though the author is not writing a 
text on personnel psychology. In this respect the presentation is rather 
weak. Recognition is made of individual differences and of the im- 
portance of employee attitudes and feelings and, rather frequently, some 
rather cogent observations on human nature are reflected in common 
sense statements. There is little overt recognition, however, of the 
dynamic nature of interpersonal relationships, of the fundamental prob- 
lem of democracy in industry, of the individual as a person rather than as 
an employee. The areas in which psychology has made specific contri- 
butions in industry are the most poorly presented, viz., interviewing, 
counseling, and testing. The influence of the social structure in company 
organization is not described, the Hawthorne studies being referred to 
merely as an example of research. 

In summary, “Personnel Management” will serve as an excellent 
textbook in the field of business administration if supplemented by 
source materials, if livened up by a stimulating instructor, and if the 
students also take courses in personnel and industrial psychology. 


Albert 8S. Thompson 
Teachers College, 


Columbia University 


Lall, Sohan. Mental measurement. Allahabad: Allahabad Law Journal 

Press, 1948. Pp 88. 

This little book presents results obtained from the administration of 
three tests to approximately 2000 Indian children in 58 government high 
schools. The children were 11+ years old and the tests were a group 
verbal intelligence test, an English language test, and an arithmetic test. 
No data are given on the construction of any of the tests; the author 
states simply that they were patterned after the Moray House tests of 
Godfrey Thomson. The tests in English and arithmetic were achieve- 
ment examinations in these areas. 

Distributions of test scores in the entire sample are presented and 
the method used for removing the skewness which appeared in all three 
distributions is defended. Perhaps the most interesting part of the 
monograph is the presentation of comparative test scores for four Indian 
castes, for children from different geographical regions, and for children 
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whose parents fell into various occupational groupings. Most of the 
differences are quite small but some are statistically significant. It seems 
likely, however, that the apparent significance was, in many cases, a 
resultant of the small and probably unrepresentative samples. How 
representative the samples were we have no way of judging. 

On the whole the monograph is somewhat amateurish and reminds 
one of publications in this country of some 25 years ago. This is a 
pioneer job, however, done under considerable difficulties, and the author 
deserves a great deal of credit. The references in the book are to Thom- 
son, Spearman, and Burt, under whom the author apparently had his 
training. 


Henry E. Garrett 
Department of Psychology 
Columbia University 


Evans, Ralph M. An introduction to color. New York: John Wiley and 
Sons, 1948. Pp. x + 340. $6.00. 


Any serious treatise on color is a major undertaking which necessitates 
the coordination of materials from physics, physiology, and psychology. 
This book was written with the avowed purpose of giving adequate treat- 
ment to materials from each of these three fields. Each phase is treated 
separately and then the three are interwoven near the end of the book. 
Consistent, understandable terminology is achieved by employing com- 
mon speech meanings of words, with a minimum number of new words 
introduced and defined. Many pictures and graphs are employed to 
help the reader grasp the fundamentals. To a large degree the text is 
descriptive and non-mathematical. Although it is not assumed that the 
reader has more than an elementary knowledge of physics and psychology, 
no simplifying omissions of subject matter are made. The author, head 
of the Color Control Department of the Eastman Kodak Company, is 
attempting to give the reader the benefit of his twenty years practical 
experience in the field. 

In this book, the author has been fairly successful in achieving his 
aims. The material is not a popular treatise, but a simplified technical 
discussion of highly complex and technical subject matter. Although 
not easy reading, persistent study of the material will be found rewarding. 
It is the only book known to the reviewer that attempts to give such a 
complete story of color. There is somewhat more emphasis upon physio- 
logical and physical than upon psychological aspects. Nevertheless, the 
psychologist will profit greatly by reading the book. Especially he will 
be able to correct many inaccurate notions obtained from elementary 
discussions. 

One wonders why a discussion of geometric optical illusions are in- 
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cluded in this treatise on color. Furthermore, to include the ambiguous 
staircase as an illusion is erroneous. The book would be more complete 
if a thorough discussion of color experiences of partially (red-green) color 
blind persons were included. Another item that would improve the 
treatise is a more complete discussion of color in illumination, and lighting 
in relation to color in interior decoration. 

Some of the more important sections deal with the use of colors in 
photography, art and display situations. In general, this book is well or- 
ganized and clearly written. It will be useful both to those interested in 
the fundamental principles of color and to those working with color 
applications in practical situations. 

Miles A. Tinker 


University of Minnesota 
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